This is a report on the modeling of exceedances of AB411 at Goleta Beach and Carpinteria Beach, aiming to understand the influence of sediment deposition. The data used in this analysis is from the Santa Barbara County Flood Control District, the California State Water Resources Control Board, the USGS National Water Information System, and NOAA’s National Centers for Environmental Information. The data was filtered to 2010 - 2024. The analysis will focus on the relationship between sediment deposition, streamflow discharge, rainfall, and water quality exceedances. The goal is to determine if deposition events are associated with water quality (fecal indicator bacteria) exceedances, which have public health implications.
Setup
Code
library(tidyverse)library(readxl)library(lubridate)library(here)library(janitor)library(tictoc)library(ggplot2)library(plotly) # for interactive plotslibrary(nortest) # for anderson-darling normality testlibrary(tsibble)library(car) # for VIF# ---------- data directories ---------# raw data directoryraw_data_dir <-here("beach_nourishment", "wrcb_water_qual", "raw_data")# intermediate data directoryint_dir <-here("beach_nourishment", "wrcb_water_qual", "intermediate")# output data directoryoutput_dir <-here("beach_nourishment", "wrcb_water_qual", "output")
Wet Multivariate Logistic Regression Analysis
The “wet” models are really just multivariate logistic regression models that include both wet and dry days, meaning that the models are not just predicting exceedances on wet days. The models are predicting exceedances on all days, regardless of whether it rained or not. The models are predicting exceedances based on the predictors (streamflow discharge, rainfall, and sediment deposition) on the day of the exceedance.
Consolidate data
Streamflow Discharge
USGS Gauge data for San Jose Creek and Carpinteria Creek was used to determine the relationship between deposition events, rainfall, streamflow discharge, and water quality. The data was downloaded from the USGS National Water Information System https://waterdata.usgs.gov/nwis. The data was downloaded for the years 2000-2024. Discharge is provided in cubic feet per second (ft³/s).
Code
# install the dataRetrieval package from USGSlibrary(remotes)library(curl)install_github("DOI-USGS/dataRetrieval",build_vignettes =TRUE, build_opts =c("--no-resave-data","--no-manual"))# package citationcitation(package ="dataRetrieval")
To cite dataRetrieval in publications, please use:
De Cicco, L.A., Hirsch, R.M., Lorenz, D., Watkins, W.D., Johnson, M.,
2025, dataRetrieval: R packages for discovering and retrieving water
data available from Federal hydrologic web services, v.2.7.18,
doi:10.5066/P9X4L3GE
A BibTeX entry for LaTeX users is
@Manual{,
author = {Laura DeCicco and Robert Hirsch and David Lorenz and Jordan Read and Jordan Walker and Lindsay Platt and David Watkins and David Blodgett and Mike Johnson and Aliesha Krall and Lee Stanish},
title = {dataRetrieval: R packages for discovering and retrieving water data available from U.S. federal hydrologic web services},
publisher = {U.S. Geological Survey},
address = {Reston, VA},
version = {2.7.18},
institution = {U.S. Geological Survey},
year = {2025},
doi = {10.5066/P9X4L3GE},
}
Code
# load the dataRetrieval packagelibrary(dataRetrieval)whatNWISdata(siteNumber ="11120500")
agency_cd site_no station_nm site_tp_cd dec_lat_va dec_long_va
1 USGS 11120500 SAN JOSE C NR GOLETA CA ST 34.45915 -119.809
2 USGS 11120500 SAN JOSE C NR GOLETA CA ST 34.45915 -119.809
3 USGS 11120500 SAN JOSE C NR GOLETA CA ST 34.45915 -119.809
4 USGS 11120500 SAN JOSE C NR GOLETA CA ST 34.45915 -119.809
5 USGS 11120500 SAN JOSE C NR GOLETA CA ST 34.45915 -119.809
6 USGS 11120500 SAN JOSE C NR GOLETA CA ST 34.45915 -119.809
7 USGS 11120500 SAN JOSE C NR GOLETA CA ST 34.45915 -119.809
8 USGS 11120500 SAN JOSE C NR GOLETA CA ST 34.45915 -119.809
9 USGS 11120500 SAN JOSE C NR GOLETA CA ST 34.45915 -119.809
10 USGS 11120500 SAN JOSE C NR GOLETA CA ST 34.45915 -119.809
11 USGS 11120500 SAN JOSE C NR GOLETA CA ST 34.45915 -119.809
12 USGS 11120500 SAN JOSE C NR GOLETA CA ST 34.45915 -119.809
13 USGS 11120500 SAN JOSE C NR GOLETA CA ST 34.45915 -119.809
14 USGS 11120500 SAN JOSE C NR GOLETA CA ST 34.45915 -119.809
15 USGS 11120500 SAN JOSE C NR GOLETA CA ST 34.45915 -119.809
16 USGS 11120500 SAN JOSE C NR GOLETA CA ST 34.45915 -119.809
17 USGS 11120500 SAN JOSE C NR GOLETA CA ST 34.45915 -119.809
18 USGS 11120500 SAN JOSE C NR GOLETA CA ST 34.45915 -119.809
19 USGS 11120500 SAN JOSE C NR GOLETA CA ST 34.45915 -119.809
20 USGS 11120500 SAN JOSE C NR GOLETA CA ST 34.45915 -119.809
21 USGS 11120500 SAN JOSE C NR GOLETA CA ST 34.45915 -119.809
22 USGS 11120500 SAN JOSE C NR GOLETA CA ST 34.45915 -119.809
23 USGS 11120500 SAN JOSE C NR GOLETA CA ST 34.45915 -119.809
24 USGS 11120500 SAN JOSE C NR GOLETA CA ST 34.45915 -119.809
25 USGS 11120500 SAN JOSE C NR GOLETA CA ST 34.45915 -119.809
26 USGS 11120500 SAN JOSE C NR GOLETA CA ST 34.45915 -119.809
27 USGS 11120500 SAN JOSE C NR GOLETA CA ST 34.45915 -119.809
28 USGS 11120500 SAN JOSE C NR GOLETA CA ST 34.45915 -119.809
29 USGS 11120500 SAN JOSE C NR GOLETA CA ST 34.45915 -119.809
30 USGS 11120500 SAN JOSE C NR GOLETA CA ST 34.45915 -119.809
31 USGS 11120500 SAN JOSE C NR GOLETA CA ST 34.45915 -119.809
32 USGS 11120500 SAN JOSE C NR GOLETA CA ST 34.45915 -119.809
33 USGS 11120500 SAN JOSE C NR GOLETA CA ST 34.45915 -119.809
34 USGS 11120500 SAN JOSE C NR GOLETA CA ST 34.45915 -119.809
35 USGS 11120500 SAN JOSE C NR GOLETA CA ST 34.45915 -119.809
36 USGS 11120500 SAN JOSE C NR GOLETA CA ST 34.45915 -119.809
37 USGS 11120500 SAN JOSE C NR GOLETA CA ST 34.45915 -119.809
38 USGS 11120500 SAN JOSE C NR GOLETA CA ST 34.45915 -119.809
39 USGS 11120500 SAN JOSE C NR GOLETA CA ST 34.45915 -119.809
40 USGS 11120500 SAN JOSE C NR GOLETA CA ST 34.45915 -119.809
41 USGS 11120500 SAN JOSE C NR GOLETA CA ST 34.45915 -119.809
42 USGS 11120500 SAN JOSE C NR GOLETA CA ST 34.45915 -119.809
43 USGS 11120500 SAN JOSE C NR GOLETA CA ST 34.45915 -119.809
44 USGS 11120500 SAN JOSE C NR GOLETA CA ST 34.45915 -119.809
45 USGS 11120500 SAN JOSE C NR GOLETA CA ST 34.45915 -119.809
46 USGS 11120500 SAN JOSE C NR GOLETA CA ST 34.45915 -119.809
47 USGS 11120500 SAN JOSE C NR GOLETA CA ST 34.45915 -119.809
48 USGS 11120500 SAN JOSE C NR GOLETA CA ST 34.45915 -119.809
49 USGS 11120500 SAN JOSE C NR GOLETA CA ST 34.45915 -119.809
50 USGS 11120500 SAN JOSE C NR GOLETA CA ST 34.45915 -119.809
51 USGS 11120500 SAN JOSE C NR GOLETA CA ST 34.45915 -119.809
52 USGS 11120500 SAN JOSE C NR GOLETA CA ST 34.45915 -119.809
53 USGS 11120500 SAN JOSE C NR GOLETA CA ST 34.45915 -119.809
54 USGS 11120500 SAN JOSE C NR GOLETA CA ST 34.45915 -119.809
55 USGS 11120500 SAN JOSE C NR GOLETA CA ST 34.45915 -119.809
56 USGS 11120500 SAN JOSE C NR GOLETA CA ST 34.45915 -119.809
57 USGS 11120500 SAN JOSE C NR GOLETA CA ST 34.45915 -119.809
58 USGS 11120500 SAN JOSE C NR GOLETA CA ST 34.45915 -119.809
59 USGS 11120500 SAN JOSE C NR GOLETA CA ST 34.45915 -119.809
60 USGS 11120500 SAN JOSE C NR GOLETA CA ST 34.45915 -119.809
61 USGS 11120500 SAN JOSE C NR GOLETA CA ST 34.45915 -119.809
62 USGS 11120500 SAN JOSE C NR GOLETA CA ST 34.45915 -119.809
63 USGS 11120500 SAN JOSE C NR GOLETA CA ST 34.45915 -119.809
64 USGS 11120500 SAN JOSE C NR GOLETA CA ST 34.45915 -119.809
65 USGS 11120500 SAN JOSE C NR GOLETA CA ST 34.45915 -119.809
66 USGS 11120500 SAN JOSE C NR GOLETA CA ST 34.45915 -119.809
67 USGS 11120500 SAN JOSE C NR GOLETA CA ST 34.45915 -119.809
68 USGS 11120500 SAN JOSE C NR GOLETA CA ST 34.45915 -119.809
69 USGS 11120500 SAN JOSE C NR GOLETA CA ST 34.45915 -119.809
70 USGS 11120500 SAN JOSE C NR GOLETA CA ST 34.45915 -119.809
71 USGS 11120500 SAN JOSE C NR GOLETA CA ST 34.45915 -119.809
72 USGS 11120500 SAN JOSE C NR GOLETA CA ST 34.45915 -119.809
73 USGS 11120500 SAN JOSE C NR GOLETA CA ST 34.45915 -119.809
74 USGS 11120500 SAN JOSE C NR GOLETA CA ST 34.45915 -119.809
75 USGS 11120500 SAN JOSE C NR GOLETA CA ST 34.45915 -119.809
76 USGS 11120500 SAN JOSE C NR GOLETA CA ST 34.45915 -119.809
77 USGS 11120500 SAN JOSE C NR GOLETA CA ST 34.45915 -119.809
78 USGS 11120500 SAN JOSE C NR GOLETA CA ST 34.45915 -119.809
coord_acy_cd dec_coord_datum_cd alt_va alt_acy_va alt_datum_cd huc_cd
1 F NAD83 97.85 0.16 NAVD88 18060013
2 F NAD83 97.85 0.16 NAVD88 18060013
3 F NAD83 97.85 0.16 NAVD88 18060013
4 F NAD83 97.85 0.16 NAVD88 18060013
5 F NAD83 97.85 0.16 NAVD88 18060013
6 F NAD83 97.85 0.16 NAVD88 18060013
7 F NAD83 97.85 0.16 NAVD88 18060013
8 F NAD83 97.85 0.16 NAVD88 18060013
9 F NAD83 97.85 0.16 NAVD88 18060013
10 F NAD83 97.85 0.16 NAVD88 18060013
11 F NAD83 97.85 0.16 NAVD88 18060013
12 F NAD83 97.85 0.16 NAVD88 18060013
13 F NAD83 97.85 0.16 NAVD88 18060013
14 F NAD83 97.85 0.16 NAVD88 18060013
15 F NAD83 97.85 0.16 NAVD88 18060013
16 F NAD83 97.85 0.16 NAVD88 18060013
17 F NAD83 97.85 0.16 NAVD88 18060013
18 F NAD83 97.85 0.16 NAVD88 18060013
19 F NAD83 97.85 0.16 NAVD88 18060013
20 F NAD83 97.85 0.16 NAVD88 18060013
21 F NAD83 97.85 0.16 NAVD88 18060013
22 F NAD83 97.85 0.16 NAVD88 18060013
23 F NAD83 97.85 0.16 NAVD88 18060013
24 F NAD83 97.85 0.16 NAVD88 18060013
25 F NAD83 97.85 0.16 NAVD88 18060013
26 F NAD83 97.85 0.16 NAVD88 18060013
27 F NAD83 97.85 0.16 NAVD88 18060013
28 F NAD83 97.85 0.16 NAVD88 18060013
29 F NAD83 97.85 0.16 NAVD88 18060013
30 F NAD83 97.85 0.16 NAVD88 18060013
31 F NAD83 97.85 0.16 NAVD88 18060013
32 F NAD83 97.85 0.16 NAVD88 18060013
33 F NAD83 97.85 0.16 NAVD88 18060013
34 F NAD83 97.85 0.16 NAVD88 18060013
35 F NAD83 97.85 0.16 NAVD88 18060013
36 F NAD83 97.85 0.16 NAVD88 18060013
37 F NAD83 97.85 0.16 NAVD88 18060013
38 F NAD83 97.85 0.16 NAVD88 18060013
39 F NAD83 97.85 0.16 NAVD88 18060013
40 F NAD83 97.85 0.16 NAVD88 18060013
41 F NAD83 97.85 0.16 NAVD88 18060013
42 F NAD83 97.85 0.16 NAVD88 18060013
43 F NAD83 97.85 0.16 NAVD88 18060013
44 F NAD83 97.85 0.16 NAVD88 18060013
45 F NAD83 97.85 0.16 NAVD88 18060013
46 F NAD83 97.85 0.16 NAVD88 18060013
47 F NAD83 97.85 0.16 NAVD88 18060013
48 F NAD83 97.85 0.16 NAVD88 18060013
49 F NAD83 97.85 0.16 NAVD88 18060013
50 F NAD83 97.85 0.16 NAVD88 18060013
51 F NAD83 97.85 0.16 NAVD88 18060013
52 F NAD83 97.85 0.16 NAVD88 18060013
53 F NAD83 97.85 0.16 NAVD88 18060013
54 F NAD83 97.85 0.16 NAVD88 18060013
55 F NAD83 97.85 0.16 NAVD88 18060013
56 F NAD83 97.85 0.16 NAVD88 18060013
57 F NAD83 97.85 0.16 NAVD88 18060013
58 F NAD83 97.85 0.16 NAVD88 18060013
59 F NAD83 97.85 0.16 NAVD88 18060013
60 F NAD83 97.85 0.16 NAVD88 18060013
61 F NAD83 97.85 0.16 NAVD88 18060013
62 F NAD83 97.85 0.16 NAVD88 18060013
63 F NAD83 97.85 0.16 NAVD88 18060013
64 F NAD83 97.85 0.16 NAVD88 18060013
65 F NAD83 97.85 0.16 NAVD88 18060013
66 F NAD83 97.85 0.16 NAVD88 18060013
67 F NAD83 97.85 0.16 NAVD88 18060013
68 F NAD83 97.85 0.16 NAVD88 18060013
69 F NAD83 97.85 0.16 NAVD88 18060013
70 F NAD83 97.85 0.16 NAVD88 18060013
71 F NAD83 97.85 0.16 NAVD88 18060013
72 F NAD83 97.85 0.16 NAVD88 18060013
73 F NAD83 97.85 0.16 NAVD88 18060013
74 F NAD83 97.85 0.16 NAVD88 18060013
75 F NAD83 97.85 0.16 NAVD88 18060013
76 F NAD83 97.85 0.16 NAVD88 18060013
77 F NAD83 97.85 0.16 NAVD88 18060013
78 F NAD83 97.85 0.16 NAVD88 18060013
data_type_cd parm_cd stat_cd ts_id loc_web_ds medium_grp_cd parm_grp_cd
1 ad <NA> <NA> 0 NA wat <NA>
2 dv 00060 00003 8464 NA wat <NA>
3 pk <NA> <NA> 0 NA wat <NA>
4 qw <NA> <NA> 0 NA wat ALL
5 qw 00001 <NA> 0 NA wat INF
6 qw 00009 <NA> 0 NA wat INF
7 qw 00010 <NA> 0 NA wat PHY
8 qw 00020 <NA> 0 NA wat PHY
9 qw 00025 <NA> 0 NA wat PHY
10 qw 00028 <NA> 0 NA wat INF
11 qw 00060 <NA> 0 NA wat PHY
12 qw 00061 <NA> 0 NA wat PHY
13 qw 00063 <NA> 0 NA wat INF
14 qw 00065 <NA> 0 NA wat PHY
15 qw 00094 <NA> 0 NA wat PHY
16 qw 00095 <NA> 0 NA wat PHY
17 qw 00191 <NA> 0 NA wat INN
18 qw 00300 <NA> 0 NA wat INN
19 qw 00301 <NA> 0 NA wat INN
20 qw 00400 <NA> 0 NA wat PHY
21 qw 00403 <NA> 0 NA wat PHY
22 qw 00405 <NA> 0 NA wat INN
23 qw 00410 <NA> 0 NA wat INN
24 qw 00419 <NA> 0 NA wat INN
25 qw 00447 <NA> 0 NA wat INN
26 qw 00450 <NA> 0 NA wat INN
27 qw 00608 <NA> 0 NA wat NUT
28 qw 00613 <NA> 0 NA wat NUT
29 qw 00618 <NA> 0 NA wat NUT
30 qw 00620 <NA> 0 NA wat NUT
31 qw 00631 <NA> 0 NA wat NUT
32 qw 00660 <NA> 0 NA wat NUT
33 qw 00671 <NA> 0 NA wat NUT
34 qw 00900 <NA> 0 NA wat PHY
35 qw 00902 <NA> 0 NA wat PHY
36 qw 00904 <NA> 0 NA wat PHY
37 qw 00915 <NA> 0 NA wat INM
38 qw 00925 <NA> 0 NA wat INM
39 qw 00930 <NA> 0 NA wat INM
40 qw 00931 <NA> 0 NA wat INM
41 qw 00932 <NA> 0 NA wat INM
42 qw 00933 <NA> 0 NA wat INM
43 qw 00935 <NA> 0 NA wat INM
44 qw 00940 <NA> 0 NA wat INN
45 qw 00945 <NA> 0 NA wat INN
46 qw 00950 <NA> 0 NA wat INN
47 qw 00955 <NA> 0 NA wat INN
48 qw 01020 <NA> 0 NA wat IMN
49 qw 01046 <NA> 0 NA wat IMM
50 qw 01056 <NA> 0 NA wat IMM
51 qw 30207 <NA> 0 NA wat PHY
52 qw 30208 <NA> 0 NA wat PHY
53 qw 30209 <NA> 0 NA wat PHY
54 qw 31625 <NA> 0 NA wat MBI
55 qw 39086 <NA> 0 NA wat INN
56 qw 70300 <NA> 0 NA wat PHY
57 qw 70301 <NA> 0 NA wat PHY
58 qw 70302 <NA> 0 NA wat PHY
59 qw 70303 <NA> 0 NA wat PHY
60 qw 71846 <NA> 0 NA wat NUT
61 qw 71850 <NA> 0 NA wat NUT
62 qw 71851 <NA> 0 NA wat NUT
63 qw 71856 <NA> 0 NA wat NUT
64 qw 80164 <NA> 0 NA wat SED
65 qw 80165 <NA> 0 NA wat SED
66 qw 80166 <NA> 0 NA wat SED
67 qw 80167 <NA> 0 NA wat SED
68 qw 80168 <NA> 0 NA wat SED
69 qw 80169 <NA> 0 NA wat SED
70 qw 80170 <NA> 0 NA wat SED
71 qw 82398 <NA> 0 NA wat INF
72 qw 90095 <NA> 0 NA wat PHY
73 qw 90410 <NA> 0 NA wat INN
74 qw 95902 <NA> 0 NA wat PHY
75 sv <NA> <NA> 0 NA wat <NA>
76 uv 00060 <NA> 14451 NA wat <NA>
77 uv 00065 <NA> 14452 NA wat <NA>
78 uv 63160 <NA> 346201 NA wat <NA>
srs_id access_cd begin_date end_date count_nu
1 0 0 2005-01-01 2024-01-01 20
2 1645423 0 1941-01-01 2025-03-01 30478
3 0 0 1941-04-04 2024-02-19 83
4 0 0 1977-08-09 1991-09-23 154
5 0 0 1986-06-04 1986-06-04 1
6 0 0 1987-08-03 1989-09-06 6
7 1645597 0 1977-08-09 1991-09-23 148
8 1645720 0 1983-09-07 1987-09-04 3
9 1641265 0 1990-02-27 1991-03-01 2
10 0 0 1977-08-09 1991-09-23 145
11 1645423 0 1977-08-09 1977-10-03 3
12 1645415 0 1977-10-03 1991-09-23 146
13 0 0 1985-02-26 1985-02-26 2
14 17164583 0 1983-05-26 1983-09-28 4
15 1646694 0 1980-10-01 1980-10-01 1
16 1646694 0 1977-08-09 1991-09-23 149
17 1640572 0 1977-08-09 1991-03-01 137
18 154302 0 1990-02-27 1991-03-01 2
19 154302 0 1990-02-27 1991-03-01 2
20 17028275 0 1977-08-09 1991-03-01 137
21 1644335 0 1980-10-01 1991-09-23 113
22 33548 0 1978-08-02 1991-03-01 13
23 1640192 0 1978-08-02 1991-03-01 10
24 1640192 0 1990-02-27 1991-03-01 2
25 119396 0 1987-03-11 1987-12-29 2
26 4788 0 1990-02-27 1991-03-01 2
27 966655 0 1991-03-01 1991-03-01 1
28 197194 0 1991-03-01 1991-03-01 1
29 197186 0 1991-03-01 1991-03-01 1
30 197186 0 1977-08-09 1977-08-09 1
31 701177 0 1978-08-02 1991-03-01 13
32 194464 0 1978-08-02 1991-03-01 13
33 194464 0 1978-08-02 1991-03-01 13
34 1640416 0 1977-08-09 1991-03-01 14
35 1640432 0 1980-01-03 1986-01-23 3
36 1640432 0 1989-01-12 1989-01-12 1
37 150268 0 1977-08-09 1991-03-01 14
38 149617 0 1977-08-09 1991-03-01 14
39 149856 0 1977-08-09 1991-03-01 14
40 1640523 0 1977-08-09 1991-03-01 14
41 1646660 0 1977-08-09 1991-03-01 14
42 967331 0 1980-01-03 1980-01-03 1
43 149740 0 1977-08-09 1991-03-01 14
44 205955 0 1977-08-09 1991-03-01 14
45 197301 0 1977-08-09 1991-03-01 14
46 206706 0 1978-08-02 1991-03-01 13
47 151977 0 1978-08-02 1991-03-01 13
48 150011 0 1977-08-09 1991-03-01 13
49 149559 0 1978-08-02 1991-03-01 13
50 149625 0 1978-08-02 1991-03-01 10
51 17164583 0 1983-05-26 1983-09-28 4
52 0 0 1977-08-09 1977-10-03 3
53 1645415 0 1977-10-03 1991-09-23 146
54 761692 0 1987-03-11 1987-03-11 1
55 1640192 0 1989-01-12 1989-01-12 1
56 1642222 0 1977-08-09 1991-09-23 140
57 1642222 0 1978-08-02 1991-03-01 13
58 1642222 0 1977-08-09 1991-09-23 144
59 1642222 0 1977-08-09 1991-09-23 150
60 966655 0 1991-03-01 1991-03-01 1
61 197186 0 1977-08-09 1977-08-09 1
62 197186 0 1991-03-01 1991-03-01 1
63 197194 0 1991-03-01 1991-03-01 1
64 17136284 0 1985-02-26 1985-02-26 1
65 17136284 0 1985-02-26 1985-02-26 1
66 17136284 0 1985-02-26 1985-02-26 2
67 17136284 0 1985-02-26 1985-02-26 2
68 17136284 0 1985-02-26 1985-02-26 2
69 17136284 0 1985-02-26 1985-02-26 2
70 17136284 0 1985-02-26 1985-02-26 1
71 0 0 1983-11-03 1991-09-23 73
72 1646694 0 1980-11-10 1991-09-23 112
73 1640192 0 1980-12-10 1991-03-01 10
74 1640432 0 1980-12-10 1986-01-23 3
75 0 0 1941-01-07 2025-01-03 1596
76 1645423 0 1988-10-01 2025-03-02 13301
77 17164583 0 2007-10-01 2025-03-02 6362
78 1642503 0 2022-10-03 2025-03-02 881
The Arroyo Burro Beach (control site) data was obtained from the Santa Barbara Long-Term Ecological Research (SBLTER) site for 2001 to 2018-09-30. The data was downloaded from the Environmental Data Initiative here: https://portal.edirepository.org/nis/mapbrowse?packageid=knb-lter-sbc.3001.9. Discharge is provided in liters per second.
1 liter is about 0.03531470 cubic feet.
Code
# Arroyo Burro Beach (control site), from SBLTER not USGSab_creek_raw <-read.delim2(here(raw_data_dir, "sblter_arroyo_burro_discharge_2001_2018.txt"), header =TRUE, sep =",", dec =".") %>% janitor::clean_names() %>%select(timestamp_local, discharge_lps)# separate the date from the time and convert to lubridateab_creek_discharge <- ab_creek_raw %>%separate(timestamp_local, into =c("date", "time"), sep ="T") %>%mutate(date =as.Date(date)) %>%mutate(discharge_lps =ifelse(discharge_lps ==c("-999"), NA, discharge_lps)) %>%filter(date >=as.Date("2010-01-01"))# only keep the years 2010-2018# sum the discharge for each day and convert to cubic feet per secondab_creek_discharge_cfs <- ab_creek_discharge %>%group_by(date) %>%summarize(ab_discharge_lps =mean(discharge_lps)) %>%# average the discharge for each day for the mean daily dischargemutate(ab_discharge_cfs = ab_discharge_lps *0.0353146667) # convert to cubic feet per second
# Arroyo Burro control site, using NOAA NCEI data:## inland gauge sitenoaa_precip_ab_inland <-read_csv(here(raw_data_dir, "noaa_arroyo_burro_inland_daily_summary_precip.csv")) %>% janitor::clean_names() %>%mutate(date =as.Date(date)) %>%select(date, daily_rain = prcp) # rename into daily_rain## east beach gauge sitenoaa_precip_ab_east <-read_csv(here(raw_data_dir, "noaa_arroyo_burro_east_daily_summary_precip.csv")) %>% janitor::clean_names() %>%mutate(date =as.Date(date)) %>%select(date, daily_rain = prcp) # rename into daily_rain
A t-test will be used to determine if the inland and east beach gauge sites are similar. The east beach gauge site has more data (2000-2025) compared to the inland gauge site (2016 - 2025).
Code
# test whether inland and east beach gauge is similarplot_inland_and_east <-ggplot() +geom_point(data = noaa_precip_ab_inland, aes(x = date, y = daily_rain), color ="blue") +geom_line(data = noaa_precip_ab_inland, aes(x = date, y = daily_rain), color ="blue") +geom_point(data = noaa_precip_ab_east, aes(x = date, y = daily_rain), color ="red") +geom_smooth(data = noaa_precip_ab_east, aes(x = date, y = daily_rain), color ="red") +labs(title ="Comparison of Precipitation at Arroyo Burro Beach Inland and East Beach Gauge Sites",x ="Date",y ="Daily Precipitation (inches)",color ="Gauge Site") +theme_minimal()plot_inland_and_east
Code
# first, check for normality. we will use the shapiro-wilks test for the inland data, and the anderson-darling test for the east beach data (which has more than 5000 observations). See <https://www.researchgate.net/publication/267205556_Power_Comparisons_of_Shapiro-Wilk_Kolmogorov-Smirnov_Lilliefors_and_Anderson-Darling_Tests> for more information.shapiro.test(noaa_precip_ab_inland$daily_rain) # W = 0.60304, p-value < 2.2e-16, so not normal
Shapiro-Wilk normality test
data: noaa_precip_ab_inland$daily_rain
W = 0.60304, p-value < 2.2e-16
Code
ad.test(noaa_precip_ab_east$daily_rain) # A = 1780.2, p-value < 2.2e-16
Anderson-Darling normality test
data: noaa_precip_ab_east$daily_rain
A = 1780.2, p-value < 2.2e-16
# since the data is not normal, we will use the wilcoxon rank sum testwilcox.test(noaa_precip_ab_inland$daily_rain, noaa_precip_ab_east$daily_rain)
Wilcoxon rank sum test with continuity correction
data: noaa_precip_ab_inland$daily_rain and noaa_precip_ab_east$daily_rain
W = 1807158, p-value < 2.2e-16
alternative hypothesis: true location shift is not equal to 0
Code
# Wilcoxon rank sum test with continuity correction# # data: noaa_precip_ab_inland$daily_rain and noaa_precip_ab_east$daily_rain# W = 1807158, p-value < 2.2e-16# alternative hypothesis: true location shift is not equal to 0
The two sites are significantly different. The east beach gauge site will be used for the Arroyo Burro Beach precipitation data, as it has many more observations (5537) versus the inland gauge site (392). Considering that the inland site only has data from 2016-2025, it is likely that the east beach gauge site is more representative of the entire time period of interest (2010-2025). Also, it is suspicious that for almost 10 years there are only 392 observations at the inland site, which is less than 1/10th of the number of observations at the east beach site.
Let’s now see if precipitation can be used without discharge data for years after 2018.
Code
# see if precipitation and discharge are correlatedab_discharge_precip_data <- ab_creek_discharge_cfs %>%left_join(noaa_precip_ab_east, by ="date")cor(ab_discharge_precip_data$ab_discharge_cfs, ab_discharge_precip_data$daily_rain, use ="complete.obs") # 0.8446518, so they are correlated!!! Let's check further
[1] 0.8446518
Code
ggplot(ab_discharge_precip_data, aes(x = daily_rain, y = ab_discharge_cfs)) +geom_point() +geom_smooth(method ="lm") +labs(x ="Daily Precipitation (inches)", y ="Discharge (cfs)", title ="Arroyo Burro Discharge vs. Precipitation") +theme_minimal()
Code
ggplot(ab_discharge_precip_data, aes(x = date)) +geom_line(aes(y = ab_discharge_cfs, color ="Discharge")) +geom_line(aes(y = daily_rain *100, color ="Precipitation")) +scale_y_continuous(sec.axis =sec_axis(~./100, name ="Precipitation (inches)")) +labs(y ="Discharge (cfs)", title ="Discharge and Precipitation Over Time") +scale_color_manual(values =c("Discharge"="blue", "Precipitation"="red"))
Code
# consider lagged correlationslag_cor <-function(lag) { ab_discharge_precip_data %>%mutate(precip_lag =lag(daily_rain, n = lag)) %>%summarise(cor =cor(ab_discharge_cfs, precip_lag, use ="complete.obs")) %>%pull(cor)}lags <-0:10lag_cors <-map_dbl(lags, lag_cor)data.frame(lag = lags, correlation = lag_cors) %>%ggplot(aes(x = lag, y = correlation)) +geom_line() +geom_point() +labs(title ="Lagged Correlation between Precipitation and Discharge")
Code
ab_discharge_precip_data %>%mutate(month = lubridate::month(date)) %>%group_by(month) %>%summarise(correlation =cor(ab_discharge_cfs, daily_rain, use ="complete.obs")) %>%ggplot(aes(x = month, y = correlation)) +geom_line() +geom_point() +scale_x_continuous(breaks =1:12) +labs(title ="Monthly Correlation between Precipitation and Discharge")
Code
ab_discharge_precip_model <-lm(ab_discharge_cfs ~ daily_rain, data = ab_discharge_precip_data)summary(ab_discharge_precip_model)
Call:
lm(formula = ab_discharge_cfs ~ daily_rain, data = ab_discharge_precip_data)
Residuals:
Min 1Q Median 3Q Max
-69.175 -0.111 0.142 0.606 204.684
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.2361 0.1220 1.935 0.053 .
daily_rain 48.7387 0.5560 87.653 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 6.686 on 3086 degrees of freedom
(106 observations deleted due to missingness)
Multiple R-squared: 0.7134, Adjusted R-squared: 0.7133
F-statistic: 7683 on 1 and 3086 DF, p-value: < 2.2e-16
The adjusted R-squared value is 0.7133, indicating that about 71.33% of the variance in discharge (ab_discharge_cfs) can be explained by daily rainfall. The F-statistic (7683) with a very low p-value (< 2.2e-16) suggests that the model is statistically significant.
Intercept (0.2361): When there’s no rain, the expected discharge is about 0.2361 cfs. However, this is not statistically significant (p = 0.053).
daily_rain (48.7387): For each inch of rain, the discharge is expected to increase by about 48.7387 cfs. This is highly statistically significant (p < 2e-16).
The residuals range from -69.175 to 204.684, with the median close to zero (0.142). This suggests the model fits reasonably well for most data points, but there are some large outliers. The very low p-value for the daily_rain coefficient (< 2e-16) indicates a strong statistical relationship between rainfall and discharge – BUT! It may still not be a good idea to use precipitation as a proxy for discharge, as the model is not perfect and there are likely other factors influencing discharge that are not captured by precipitation data……. need to run this by the team.
Sediment Deposition Binary Variable
Determined from the presence of sediment deposition at the beach. The data was collected by the Santa Barbara County Flood Control District. The data is available for the years 2010-2024.
# read in the dataall_beach_qual_raw <-read_csv(here(raw_data_dir, "beach_monitoring_wrcb-AllResultData.csv")) %>%clean_names()# Goleta Beachgoleta_beach_qual <- all_beach_qual_raw %>%filter(station_description %in%"Goleta Beach")# convert to lubridategoleta_beach_qual_ymd <- goleta_beach_qual %>%mutate(sample_date =mdy(sample_date)) %>%select(date = sample_date, gb_parameter = parameter, gb_result = result)# make binary variable for any exceedance of the state health standard: 104 MPN/100mL for enterococcus, 400 MPN/100mL for fecal coliforms, 400 MPN/100mL for ecoli, and 1000 MPN/100mL for total coliforms, and 0 if not exceededbinary_water_qual_gb <- goleta_beach_qual_ymd %>%mutate(enterococcus_exceed =ifelse(gb_parameter %in%"Enterococcus"& gb_result >104, 1, 0),ecoli_exceed =ifelse(gb_parameter %in%"E. Coli"& gb_result >400, 1, 0),fecal_coliforms_exceed =ifelse(gb_parameter %in%"Fecal Coliforms"& gb_result >400, 1, 0),total_coliforms_exceed =ifelse(gb_parameter %in%"Total Coliforms"& gb_result >1000, 1, 0),any_exceed =ifelse(enterococcus_exceed ==1| ecoli_exceed ==1| fecal_coliforms_exceed ==1| total_coliforms_exceed ==1, 1, 0))
Code
# Carpinteria Beachcarpenteria_beach_qual <- all_beach_qual_raw %>%filter(station_description %in%"Carpinteria State")# convert to lubridatecarpenteria_beach_qual_ymd <- carpenteria_beach_qual %>%mutate(sample_date =mdy(sample_date)) %>%select(date = sample_date, carp_parameter = parameter, carp_result = result)# make binary variable for any exceedance of the state health standard: 104 MPN/100mL for enterococcus, 400 MPN/100mL for fecal coliforms, 400 MPN/100mL for ecoli, and 1000 MPN/100mL for total coliforms, and 0 if not exceededbinary_water_qual_carp <- carpenteria_beach_qual_ymd %>%mutate(enterococcus_exceed =ifelse(carp_parameter %in%"Enterococcus"& carp_result >104, 1, 0),ecoli_exceed =ifelse(carp_parameter %in%"E. Coli"& carp_result >400, 1, 0),fecal_coliforms_exceed =ifelse(carp_parameter %in%"Fecal Coliforms"& carp_result >400, 1, 0),total_coliforms_exceed =ifelse(carp_parameter %in%"Total Coliforms"& carp_result >1000, 1, 0),any_exceed =ifelse(enterococcus_exceed ==1| ecoli_exceed ==1| fecal_coliforms_exceed ==1| total_coliforms_exceed ==1, 1, 0))
Code
# Arroyo Burro Beach (control site)ab_beach_qual_raw <-read_xlsx(here(raw_data_dir, "beach_monitoring_wrcb-arroyo_burro.xlsx")) %>%clean_names()# convert to lubridateab_beach_qual_ymd <- ab_beach_qual_raw %>%mutate(sample_date =ymd(sample_date)) %>%select(date = sample_date, ab_parameter = parameter, ab_result = result)# make binary variable for any exceedance of the state health standard: 104 MPN/100mL for enterococcus, 400 MPN/100mL for fecal coliforms, 400 MPN/100mL for ecoli, and 1000 MPN/100mL for total coliforms, and 0 if not exceededbinary_water_qual_ab <- ab_beach_qual_ymd %>%mutate(enterococcus_exceed =ifelse(ab_parameter %in%"Enterococcus"& ab_result >104, 1, 0),ecoli_exceed =ifelse(ab_parameter %in%"E. Coli"& ab_result >400, 1, 0),fecal_coliforms_exceed =ifelse(ab_parameter %in%"Fecal Coliforms"& ab_result >400, 1, 0),total_coliforms_exceed =ifelse(ab_parameter %in%"Total Coliforms"& ab_result >1000, 1, 0),any_exceed =ifelse(enterococcus_exceed ==1| ecoli_exceed ==1| fecal_coliforms_exceed ==1| total_coliforms_exceed ==1, 1, 0))
Merge Data
Question: should I drop NAs before modeling?
Code
# Goleta Beachgoleta_beach_data <- binary_water_qual_gb %>%left_join(san_jose_creek_discharge, by ="date") %>%left_join(noaa_precip_gb, by ="date") %>%left_join(goleta_beach_deposition, by ="date")
Code
# Carpinteria Beachcarpenteria_beach_data <- binary_water_qual_carp %>%left_join(carp_creek_discharge, by ="date") %>%left_join(carp_precip, by ="date") %>%left_join(carp_beach_deposition, by ="date")
Code
# Arroyo Burro Beach (control site)ab_beach_data <- binary_water_qual_ab %>%left_join(ab_creek_discharge_cfs, by ="date") %>%left_join(noaa_precip_ab_east, by ="date") %>%left_join(ab_beach_deposition, by ="date")
Check Assumptions and Assess Linearity
Goleta Beach
Code
# test for independence of observations# this assumption is met since data collection was random and independent
Code
# check for multicollinearity using Variance Inflation Factors (VIF)library(car)goleta_beach_data$daily_rain <-as.numeric(goleta_beach_data$daily_rain) # because it was a character beforegoleta_model <-glm(any_exceed ~ sj_discharge + daily_rain + deposition_occurred, data = goleta_beach_data, family = binomial)vif(goleta_model) # use the vif() function from the car package to check for multicollinearity. VIF values greater than 5 or 10 indicate potential multicollinearity issues
# assess linearity between continuous predictors and log odds of the outcome# perform Box-Tidwell test, interaction terms are the continuous variablesgoleta_beach_data$sj_discharge_interact <- goleta_beach_data$sj_discharge *log(goleta_beach_data$sj_discharge)goleta_beach_data$daily_rain_interact <- goleta_beach_data$daily_rain *log(goleta_beach_data$daily_rain +1) # adding 1 to avoid log(0)gb_box_tidwell_model <-glm(any_exceed ~ sj_discharge + daily_rain + deposition_occurred + sj_discharge_interact + daily_rain_interact, data = goleta_beach_data, family = binomial)summary(gb_box_tidwell_model) # deposition_occurred are statistically significant (p < 0.05) but that is okay because it is not linear, it is binary. The interaction terms (sj_discharge_interact and daily_rain_interact) are not statistically significant (p > 0.05).
Call:
glm(formula = any_exceed ~ sj_discharge + daily_rain + deposition_occurred +
sj_discharge_interact + daily_rain_interact, family = binomial,
data = goleta_beach_data)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.55945 0.10273 -24.914 <2e-16 ***
sj_discharge 0.09211 0.06204 1.485 0.138
daily_rain 2.32432 1.79781 1.293 0.196
deposition_occurred 2.25807 0.23231 9.720 <2e-16 ***
sj_discharge_interact -0.00993 0.01611 -0.617 0.538
daily_rain_interact 0.52492 2.72104 0.193 0.847
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1264.9 on 1736 degrees of freedom
Residual deviance: 1053.0 on 1731 degrees of freedom
(543 observations deleted due to missingness)
AIC: 1065
Number of Fisher Scoring iterations: 5
Code
# Coefficients:# Estimate Std. Error z value Pr(>|z|) # (Intercept) -2.55945 0.10273 -24.914 <2e-16 ***# sj_discharge 0.09211 0.06204 1.485 0.138 # daily_rain 2.32432 1.79781 1.293 0.196 # deposition_occurred 2.25807 0.23231 9.720 <2e-16 ***# sj_discharge_interact -0.00993 0.01611 -0.617 0.538 # daily_rain_interact 0.52492 2.72104 0.193 0.847 # ---# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1# # (Dispersion parameter for binomial family taken to be 1)# # Null deviance: 1264.9 on 1736 degrees of freedom# Residual deviance: 1053.0 on 1731 degrees of freedom# (543 observations deleted due to missingness)# AIC: 1065# # Number of Fisher Scoring iterations: 5# the Box-Tidwell test is performed by creating interaction terms between continuous predictors and their natural logs, then including these in the model. Significant interaction terms suggest non-linearity. # non-significant interaction terms (p > 0.05) suggest linearity between the continuous predictors and the log odds of the outcome.
Code
# create scatter plots of log odds vs. each predictor# library(ggplot2)# # # function to calculate log odds, visually assess linearity# calc_log_odds <- function(x) log(x / (1 - x))# # ggplot(goleta_beach_data, aes(x = sj_discharge, y = calc_log_odds(any_exceed))) +# geom_point() +# geom_smooth(method = "loess") +# labs(title = "Log Odds vs. San Jose Creek Discharge")# # ggplot(goleta_beach_data, aes(x = daily_rain, y = calc_log_odds(any_exceed))) +# geom_point() +# geom_smooth(method = "loess") +# labs(title = "Log Odds vs. Daily Rain")# # # generate partial residual plots# library(effects) # partial residual plots are generated using the effects package to further assess linearity# # plot(allEffects(goleta_model)) # looks fairly linear??
Carpinteria Beach
Code
# test for independence of observations# this assumption is met since data collection was random and independent
Code
# check for multicollinearity using Variance Inflation Factors (VIF)library(car)carpenteria_beach_data$daily_rain <-as.numeric(carpenteria_beach_data$daily_rain) # because it was a character beforecarpenteria_model <-glm(any_exceed ~ carp_discharge + daily_rain + deposition_occurred, data = carpenteria_beach_data, family = binomial)vif(carpenteria_model) # use the vif() function from the car package to check for multicollinearity. VIF values greater than 5 or 10 indicate potential multicollinearity issues
# assess linearity between continuous predictors and log odds of the outcome# perform Box-Tidwell test, interaction terms are the continuous variablescarpenteria_beach_data$carp_discharge_interact <- carpenteria_beach_data$carp_discharge *log(carpenteria_beach_data$carp_discharge)carpenteria_beach_data$daily_rain_interact <- carpenteria_beach_data$daily_rain *log(carpenteria_beach_data$daily_rain +1) # adding 1 to avoid log(0)carp_box_tidwell_model <-glm(any_exceed ~ carp_discharge + daily_rain + deposition_occurred + carp_discharge_interact + daily_rain_interact, data = carpenteria_beach_data, family = binomial)summary(carp_box_tidwell_model) # daily_rain is statistically significant (p < 0.05). The interaction terms (sj_discharge_interact and daily_rain_interact) are not statistically significant (p > 0.05).
Call:
glm(formula = any_exceed ~ carp_discharge + daily_rain + deposition_occurred +
carp_discharge_interact + daily_rain_interact, family = binomial,
data = carpenteria_beach_data)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.328965 0.303993 -4.372 1.23e-05 ***
carp_discharge 0.069389 0.045635 1.521 0.1284
daily_rain 3.995332 1.592725 2.508 0.0121 *
deposition_occurred 0.358612 0.447016 0.802 0.4224
carp_discharge_interact -0.013701 0.009309 -1.472 0.1411
daily_rain_interact -2.636537 1.886912 -1.397 0.1623
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 240.70 on 175 degrees of freedom
Residual deviance: 210.83 on 170 degrees of freedom
(1984 observations deleted due to missingness)
AIC: 222.83
Number of Fisher Scoring iterations: 4
Code
# Coefficients:# Estimate Std. Error z value Pr(>|z|) # (Intercept) -1.263639 0.291883 -4.329 1.5e-05 ***# carp_discharge 0.068925 0.045598 1.512 0.1306 # daily_rain 3.863127 1.582730 2.441 0.0147 * # deposition_occurred 0.107901 0.485392 0.222 0.8241 # carp_discharge_interact -0.013423 0.009267 -1.448 0.1475 # daily_rain_interact -2.554095 1.883818 -1.356 0.1752 # ---# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1# # (Dispersion parameter for binomial family taken to be 1)# # Null deviance: 240.70 on 175 degrees of freedom# Residual deviance: 211.42 on 170 degrees of freedom# (1984 observations deleted due to missingness)# AIC: 223.42# # Number of Fisher Scoring iterations: 4# the Box-Tidwell test is performed by creating interaction terms between continuous predictors and their natural logs, then including these in the model. Significant interaction terms suggest non-linearity. # non-significant interaction terms (p > 0.05) suggest linearity between the continuous predictors and the log odds of the outcome.
Code
# create scatter plots of log odds vs. each predictor# library(ggplot2)# function to calculate log odds, visually assess linearity# calc_log_odds <- function(x) log(x / (1 - x))# # ggplot(goleta_beach_data, aes(x = sj_discharge, y = calc_log_odds(any_exceed))) +# geom_point() +# geom_smooth(method = "loess") +# labs(title = "Log Odds vs. San Jose Creek Discharge")# # ggplot(goleta_beach_data, aes(x = daily_rain, y = calc_log_odds(any_exceed))) +# geom_point() +# geom_smooth(method = "loess") +# labs(title = "Log Odds vs. Daily Rain")# generate partial residual plots# library(effects) # partial residual plots are generated using the effects package to further assess linearity# # plot(allEffects(goleta_model))# Error in `contrasts<-`(`*tmp*`, value = ca) : # contrasts apply only to factors# In addition: Warning message:# In model.frame.default(Terms, predict.data, xlev = factor.levels, :# variable 'daily_rain' is not a factor
Arroyo Burro Beach
Code
# test for independence of observations# this assumption is met since data collection was random and independent
Code
# check for multicollinearity using Variance Inflation Factors (VIF)# ab_model <- glm(any_exceed ~ ab_discharge_cfs + daily_rain + deposition_occurred, # data = ab_beach_data, # family = binomial) # Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : contrasts can be applied only to factors with 2 or more levels# vif(ab_model) # error: there are aliased coefficients in the model. This is likely due to the binary variable 'deposition_occurred'. We can ignore this error and proceed with the model.# # cor(ab_beach_data[c("ab_discharge_cfs", "daily_rain", "deposition_occurred")])# # alias(ab_model)## not using deposition (since it is all 0s since there has been no deposition there)ab_model <-glm(any_exceed ~ ab_discharge_cfs + daily_rain, data = ab_beach_data, family = binomial)vif(ab_model) # use the vif() function from the car package to check for multicollinearity. VIF values greater than 5 or 10 indicate potential multicollinearity issues
# assess linearity between continuous predictors and log odds of the outcome# perform Box-Tidwell test, interaction terms are the continuous variablesab_beach_data$ab_discharge_interact <- ab_beach_data$ab_discharge_cfs *log(ab_beach_data$ab_discharge_cfs +1) # adding 1 to avoid log(0)ab_beach_data$daily_rain_interact <- ab_beach_data$daily_rain *log(ab_beach_data$daily_rain +1) # adding 1 to avoid log(0)ab_box_tidwell_model <-glm(any_exceed ~ ab_discharge_cfs + daily_rain + ab_discharge_interact + daily_rain_interact, data = ab_beach_data, family = binomial)summary(ab_box_tidwell_model)
Call:
glm(formula = any_exceed ~ ab_discharge_cfs + daily_rain + ab_discharge_interact +
daily_rain_interact, family = binomial, data = ab_beach_data)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.3788 0.1239 -19.207 < 2e-16 ***
ab_discharge_cfs 0.5060 0.1274 3.972 7.13e-05 ***
daily_rain 1.3363 2.4133 0.554 0.57976
ab_discharge_interact -0.1169 0.0389 -3.006 0.00264 **
daily_rain_interact 0.8060 4.0304 0.200 0.84150
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1029.19 on 1306 degrees of freedom
Residual deviance: 936.23 on 1302 degrees of freedom
(2438 observations deleted due to missingness)
AIC: 946.23
Number of Fisher Scoring iterations: 5
Code
# Call:# glm(formula = any_exceed ~ ab_discharge_cfs + daily_rain + ab_discharge_interact + # daily_rain_interact, family = binomial, data = ab_beach_data)# # Coefficients:# Estimate Std. Error z value Pr(>|z|) # (Intercept) -2.3788 0.1239 -19.207 < 2e-16 ***# ab_discharge_cfs 0.5060 0.1274 3.972 7.13e-05 ***# daily_rain 1.3363 2.4133 0.554 0.57976 # ab_discharge_interact -0.1169 0.0389 -3.006 0.00264 ** # daily_rain_interact 0.8060 4.0304 0.200 0.84150 # ---# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1# # (Dispersion parameter for binomial family taken to be 1)# # Null deviance: 1029.19 on 1306 degrees of freedom# Residual deviance: 936.23 on 1302 degrees of freedom# (2438 observations deleted due to missingness)# AIC: 946.23# # Number of Fisher Scoring iterations: 5# the Box-Tidwell test is performed by creating interaction terms between continuous predictors and their natural logs, then including these in the model. Significant interaction terms suggest non-linearity.# non-significant interaction terms (p > 0.05) suggest linearity between the continuous predictors and the log odds of the outcome# the daily_rain and daily_rain_interact are not statistically significant (p > 0.05), suggesting linearity between daily_rain and the log odds of 'any_exceed'.# however, the ab_discharge_cfs and ab_discharge_interact are statistically significant (p < 0.05), suggesting non-linearity between ab_discharge_cfs and the log odds of 'any_exceed'.# --------- Transforming the discharge data to see if it can pass the Box-Tidwell test ------------ab_beach_data$log_discharge <-log(ab_beach_data$ab_discharge_cfs +1)ab_beach_data$log_discharge_interact <- ab_beach_data$log_discharge *log(ab_beach_data$log_discharge +1) # adding 1 to avoid log(0)ab_box_tidwell_model_log <-glm(any_exceed ~ log_discharge + daily_rain + log_discharge_interact + daily_rain_interact, data = ab_beach_data, family = binomial)summary(ab_box_tidwell_model_log)
Call:
glm(formula = any_exceed ~ log_discharge + daily_rain + log_discharge_interact +
daily_rain_interact, family = binomial, data = ab_beach_data)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.4562 0.1856 -13.231 <2e-16 ***
log_discharge 0.7282 0.5292 1.376 0.169
daily_rain 1.8130 2.3495 0.772 0.440
log_discharge_interact 0.1228 0.4111 0.299 0.765
daily_rain_interact 0.3000 3.9451 0.076 0.939
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1029.19 on 1306 degrees of freedom
Residual deviance: 937.11 on 1302 degrees of freedom
(2438 observations deleted due to missingness)
AIC: 947.11
Number of Fisher Scoring iterations: 5
Code
# Call:# glm(formula = any_exceed ~ log_discharge + daily_rain + log_discharge_interact + # daily_rain_interact, family = binomial, data = ab_beach_data)# # Coefficients:# Estimate Std. Error z value Pr(>|z|) # (Intercept) -2.4562 0.1856 -13.231 <2e-16 ***# log_discharge 0.7282 0.5292 1.376 0.169 # daily_rain 1.8130 2.3495 0.772 0.440 # log_discharge_interact 0.1228 0.4111 0.299 0.765 # daily_rain_interact 0.3000 3.9451 0.076 0.939 # ---# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1# # (Dispersion parameter for binomial family taken to be 1)# # Null deviance: 1029.19 on 1306 degrees of freedom# Residual deviance: 937.11 on 1302 degrees of freedom# (2438 observations deleted due to missingness)# AIC: 947.11# # Number of Fisher Scoring iterations: 5# this is great! now none of the interaction terms are statistically significant (p > 0.05), suggesting linearity between the continuous predictors and the log odds of the outcome.
Fit Multivariate Logistic Regression Models
Goleta Beach
For any exceedance of the state health standard: 104 MPN/100mL for enterococcus, 400 MPN/100mL for fecal coliforms, 400 MPN/100mL for ecoli, and 1000 MPN/100mL for total coliforms.
Code
# fit the multivariate logistic regression modelgoleta_model <-glm(any_exceed ~ sj_discharge + daily_rain + deposition_occurred, data = goleta_beach_data, family = binomial)summary(goleta_model)
Call:
glm(formula = any_exceed ~ sj_discharge + daily_rain + deposition_occurred,
family = binomial, data = goleta_beach_data)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.61411 0.08552 -30.567 < 2e-16 ***
sj_discharge 0.05822 0.01146 5.081 3.75e-07 ***
daily_rain 2.48553 0.47297 5.255 1.48e-07 ***
deposition_occurred 2.35552 0.22109 10.654 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1512.2 on 2276 degrees of freedom
Residual deviance: 1285.6 on 2273 degrees of freedom
(3 observations deleted due to missingness)
AIC: 1293.6
Number of Fisher Scoring iterations: 5
Code
# Coefficients:# Estimate Std. Error z value Pr(>|z|) # (Intercept) -2.61411 0.08552 -30.567 < 2e-16 ***# sj_discharge 0.05822 0.01146 5.081 3.75e-07 ***# daily_rain 2.48553 0.47297 5.255 1.48e-07 ***# deposition_occurred 2.35552 0.22109 10.654 < 2e-16 ***# ---# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1# # (Dispersion parameter for binomial family taken to be 1)# # Null deviance: 1512.2 on 2276 degrees of freedom# Residual deviance: 1285.6 on 2273 degrees of freedom# (3 observations deleted due to missingness)# AIC: 1293.6# # Number of Fisher Scoring iterations: 5# The model predicts 'any_exceed' based on 'sj_discharge', 'daily_rain', and 'deposition_occurred'.# All predictors are statistically significant:# sj_discharge: p = 3.75e-07 ***. For each unit increase, the log odds of 'any_exceed' increase by 0.05822.# # daily_rain: p = 1.48e-07 ***. For each unit increase, the log odds of 'any_exceed' increase by 2.48553.# # deposition_occurred: p < 2e-16 ***. When deposition occurs, the log odds of 'any_exceed' increase by 2.35552.# The estimates show the change in log odds of 'any_exceed' for a one-unit increase in each predictor.# Null deviance: 1512.2 on 2276 degrees of freedom# Residual deviance: 1285.6 on 2273 degrees of freedom## the reduction in deviance in the residual suggests the model is better than a null model.# AIC: 1293.6 (lower values indicate better model fit)
For only enterococcus exceedances:
Code
# fit the multivariate logistic regression modelgoleta_model_enterococcus <-glm(enterococcus_exceed ~ sj_discharge + daily_rain + deposition_occurred, data = goleta_beach_data, family = binomial)summary(goleta_model_enterococcus)
Call:
glm(formula = enterococcus_exceed ~ sj_discharge + daily_rain +
deposition_occurred, family = binomial, data = goleta_beach_data)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.73704 0.14137 -26.435 < 2e-16 ***
sj_discharge 0.03635 0.01062 3.422 0.000621 ***
daily_rain 1.52675 0.55583 2.747 0.006018 **
deposition_occurred 2.00391 0.28450 7.044 1.87e-12 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 692.93 on 2276 degrees of freedom
Residual deviance: 618.81 on 2273 degrees of freedom
(3 observations deleted due to missingness)
AIC: 626.81
Number of Fisher Scoring iterations: 6
Code
# Coefficients:# Estimate Std. Error z value Pr(>|z|) # (Intercept) -3.73704 0.14137 -26.435 < 2e-16 ***# sj_discharge 0.03635 0.01062 3.422 0.000621 ***# daily_rain 1.52675 0.55583 2.747 0.006018 ** # deposition_occurred 2.00391 0.28450 7.044 1.87e-12 ***# ---# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1# # (Dispersion parameter for binomial family taken to be 1)# # Null deviance: 692.93 on 2276 degrees of freedom# Residual deviance: 618.81 on 2273 degrees of freedom# (3 observations deleted due to missingness)# AIC: 626.81# # Number of Fisher Scoring iterations: 6
It can be interpreted that for each unit increase in San Jose Creek discharge, the log odds of an enterococcus exceedance increase by 0.03635. For each unit increase in daily rain, the log odds of an enterococcus exceedance increase by 1.52675. When deposition occurs, the log odds of an enterococcus exceedance increase by 2.00391.
For only fecal coliform exceedances:
Code
# fit the multivariate logistic regression modelgoleta_model_fecal_coliforms <-glm(fecal_coliforms_exceed ~ sj_discharge + daily_rain + deposition_occurred, data = goleta_beach_data, family = binomial)summary(goleta_model_fecal_coliforms)
Call:
glm(formula = fecal_coliforms_exceed ~ sj_discharge + daily_rain +
deposition_occurred, family = binomial, data = goleta_beach_data)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -5.45009 0.32490 -16.775 < 2e-16 ***
sj_discharge 0.01122 0.03080 0.364 0.716
daily_rain 1.54283 1.05665 1.460 0.144
deposition_occurred 2.55626 0.53015 4.822 1.42e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 200.38 on 2276 degrees of freedom
Residual deviance: 176.60 on 2273 degrees of freedom
(3 observations deleted due to missingness)
AIC: 184.6
Number of Fisher Scoring iterations: 8
Code
# Coefficients:# Estimate Std. Error z value Pr(>|z|) # (Intercept) -5.45009 0.32490 -16.775 < 2e-16 ***# sj_discharge 0.01122 0.03080 0.364 0.716 # daily_rain 1.54283 1.05665 1.460 0.144 # deposition_occurred 2.55626 0.53015 4.822 1.42e-06 ***# ---# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1# # (Dispersion parameter for binomial family taken to be 1)# # Null deviance: 200.38 on 2276 degrees of freedom# Residual deviance: 176.60 on 2273 degrees of freedom# (3 observations deleted due to missingness)# AIC: 184.6# # Number of Fisher Scoring iterations: 8
This is VERY interesting!! Only deposition_occurred is statistically significant for fecal coliform exceedances. For each unit increase in San Jose Creek discharge, the log odds of a fecal coliform exceedance increase by 0.01122. For each unit increase in daily rain, the log odds of a fecal coliform exceedance increase by 1.54283. When deposition occurs, the log odds of a fecal coliform exceedance increase by 2.55626. That is almost a 5-fold increase in the log odds of a fecal coliform exceedance when deposition occurs (exp(2.55626) = 12.88). This can be interpreted as a 12.88 times increase in the odds of a fecal coliform exceedance when deposition occurs.
For only total coliform exceedances:
Code
# fit the multivariate logistic regression modelgoleta_model_total_coliforms <-glm(total_coliforms_exceed ~ sj_discharge + daily_rain + deposition_occurred, data = goleta_beach_data, family = binomial)summary(goleta_model_total_coliforms)
Call:
glm(formula = total_coliforms_exceed ~ sj_discharge + daily_rain +
deposition_occurred, family = binomial, data = goleta_beach_data)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.498516 0.126464 -27.664 < 2e-16 ***
sj_discharge 0.041923 0.009964 4.208 2.58e-05 ***
daily_rain 1.292201 0.546873 2.363 0.0181 *
deposition_occurred 2.075757 0.259926 7.986 1.39e-15 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 826.78 on 2276 degrees of freedom
Residual deviance: 731.99 on 2273 degrees of freedom
(3 observations deleted due to missingness)
AIC: 739.99
Number of Fisher Scoring iterations: 6
Code
# Coefficients:# Estimate Std. Error z value Pr(>|z|) # (Intercept) -3.498516 0.126464 -27.664 < 2e-16 ***# sj_discharge 0.041923 0.009964 4.208 2.58e-05 ***# daily_rain 1.292201 0.546873 2.363 0.0181 * # deposition_occurred 2.075757 0.259926 7.986 1.39e-15 ***# ---# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1# # (Dispersion parameter for binomial family taken to be 1)# # Null deviance: 826.78 on 2276 degrees of freedom# Residual deviance: 731.99 on 2273 degrees of freedom# (3 observations deleted due to missingness)# AIC: 739.99# # Number of Fisher Scoring iterations: 6
These are all significant which makes sense as any_exceed is a reflection of this in a way. The log odds of a total coliform exceedance increase by 0.041923 for each unit increase in San Jose Creek discharge. The log odds of a total coliform exceedance increase by 1.292201 for each unit increase in daily rain. When deposition occurs, the log odds of a total coliform exceedance increase by 2.075757.
Carpinteria Beach
Code
carpenteria_model <-glm(any_exceed ~ carp_discharge + daily_rain + deposition_occurred, data = carpenteria_beach_data, family = binomial)summary(carpenteria_model)
Call:
glm(formula = any_exceed ~ carp_discharge + daily_rain + deposition_occurred,
family = binomial, data = carpenteria_beach_data)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.90930 0.19933 -9.579 < 2e-16 ***
carp_discharge 0.01394 0.00830 1.680 0.0930 .
daily_rain 2.49736 0.42508 5.875 4.23e-09 ***
deposition_occurred 1.03163 0.44647 2.311 0.0209 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 368.61 on 316 degrees of freedom
Residual deviance: 305.40 on 313 degrees of freedom
(1843 observations deleted due to missingness)
AIC: 313.4
Number of Fisher Scoring iterations: 4
Code
# Estimate Std. Error z value Pr(>|z|) # (Intercept) -1.90930 0.19933 -9.579 < 2e-16 ***# carp_discharge 0.01394 0.00830 1.680 0.0930 . # daily_rain 2.49736 0.42508 5.875 4.23e-09 ***# deposition_occurred 1.03163 0.44647 2.311 0.0209 * # ---# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1# # (Dispersion parameter for binomial family taken to be 1)# # Null deviance: 368.61 on 316 degrees of freedom# Residual deviance: 305.40 on 313 degrees of freedom# (1843 observations deleted due to missingness)# AIC: 313.4# # Number of Fisher Scoring iterations: 4# The model predicts 'any_exceed' based on 'carp_discharge', 'daily_rain', and 'deposition_occurred'.# Almost all predictors are statistically significant:# carp_discharge: p = 0.0930, if the alpha were 0.1 then it would be significant. For each unit increase, the log odds of 'any_exceed' increase by 0.01394.# daily_rain: p = 4.23e-09 ***. For each unit increase, the log odds of 'any_exceed' increase by 2.49736.# deposition_occurred: p = 0.0209 *. When deposition occurs, the log odds of 'any_exceed' increase by 1.03163.# The estimates show the change in log odds of 'any_exceed' for a one-unit increase in each predictor.
For Carpinteria’s any exceedance model, daily rain and deposition_occurred are statistically significant. For each unit increase in Carpinteria Creek discharge, the log odds of an exceedance increase by 0.01394. For each unit increase in daily rain, the log odds of an exceedance increase by 2.49736. When deposition occurs, the log odds of an exceedance increase by 1.03163. However, Carpinteria Creek discharge is not statistically significant. If the alpha were 0.1, then it would be significant.
For only enterococcus exceedances:
Code
carpenteria_model_enterococcus <-glm(enterococcus_exceed ~ carp_discharge + daily_rain + deposition_occurred, data = carpenteria_beach_data, family = binomial)summary(carpenteria_model_enterococcus)
Call:
glm(formula = enterococcus_exceed ~ carp_discharge + daily_rain +
deposition_occurred, family = binomial, data = carpenteria_beach_data)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.840435 0.280285 -10.134 < 2e-16 ***
carp_discharge 0.005078 0.006431 0.790 0.42977
daily_rain 1.453794 0.447725 3.247 0.00117 **
deposition_occurred 0.483269 0.677161 0.714 0.47543
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 198.53 on 316 degrees of freedom
Residual deviance: 184.18 on 313 degrees of freedom
(1843 observations deleted due to missingness)
AIC: 192.18
Number of Fisher Scoring iterations: 5
Code
# Coefficients:# Estimate Std. Error z value Pr(>|z|) # (Intercept) -2.840435 0.280285 -10.134 < 2e-16 ***# carp_discharge 0.005078 0.006431 0.790 0.42977 # daily_rain 1.453794 0.447725 3.247 0.00117 ** # deposition_occurred 0.483269 0.677161 0.714 0.47543 # ---# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1# # (Dispersion parameter for binomial family taken to be 1)# # Null deviance: 198.53 on 316 degrees of freedom# Residual deviance: 184.18 on 313 degrees of freedom# (1843 observations deleted due to missingness)# AIC: 192.18# # Number of Fisher Scoring iterations: 5# only daily_rain is significant here. For each unit increase in Carpinteria Creek discharge, the log odds of an enterococcus exceedance increase by 0.005078. For each unit increase in daily rain, the log odds of an enterococcus exceedance increase by 1.453794. When deposition occurs, the log odds of an enterococcus exceedance increase by 0.483269.
Only daily rain is statistically significant for enterococcus exceedances. For each unit increase in Carpinteria Creek discharge, the log odds of an enterococcus exceedance increase by 0.005078. For each unit increase in daily rain, the log odds of an enterococcus exceedance increase by 1.453794. When deposition occurs, the log odds of an enterococcus exceedance increase by 0.483269.
For only fecal coliform exceedances:
Code
carpenteria_model_fecal_coliforms <-glm(fecal_coliforms_exceed ~ carp_discharge + daily_rain + deposition_occurred, data = carpenteria_beach_data, family = binomial)summary(carpenteria_model_fecal_coliforms)
Call:
glm(formula = fecal_coliforms_exceed ~ carp_discharge + daily_rain +
deposition_occurred, family = binomial, data = carpenteria_beach_data)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -4.309047 0.516033 -8.350 < 2e-16 ***
carp_discharge 0.008865 0.008975 0.988 0.32327
daily_rain 2.041463 0.624148 3.271 0.00107 **
deposition_occurred -0.217494 1.427782 -0.152 0.87893
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 95.556 on 316 degrees of freedom
Residual deviance: 80.849 on 313 degrees of freedom
(1843 observations deleted due to missingness)
AIC: 88.849
Number of Fisher Scoring iterations: 6
Code
# Estimate Std. Error z value Pr(>|z|) # (Intercept) -4.309047 0.516033 -8.350 < 2e-16 ***# carp_discharge 0.008865 0.008975 0.988 0.32327 # daily_rain 2.041463 0.624148 3.271 0.00107 ** # deposition_occurred -0.217494 1.427782 -0.152 0.87893 # ---# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1# # (Dispersion parameter for binomial family taken to be 1)# # Null deviance: 95.556 on 316 degrees of freedom# Residual deviance: 80.849 on 313 degrees of freedom# (1843 observations deleted due to missingness)# AIC: 88.849# # Number of Fisher Scoring iterations: 6
Again, only daily rain is statistically significant for fecal coliform exceedances. For each unit increase in Carpinteria Creek discharge, the log odds of a fecal coliform exceedance increase by 0.008865. For each unit increase in daily rain, the log odds of a fecal coliform exceedance increase by 2.041463. When deposition occurs, the log odds of a fecal coliform exceedance decrease by 0.217494.
For only ecoli exceedances:
Code
carpenteria_model_ecoli <-glm(ecoli_exceed ~ carp_discharge + daily_rain + deposition_occurred, data = carpenteria_beach_data, family = binomial)summary(carpenteria_model_ecoli)
Call:
glm(formula = ecoli_exceed ~ carp_discharge + daily_rain + deposition_occurred,
family = binomial, data = carpenteria_beach_data)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -5.665753 1.039347 -5.451 5e-08 ***
carp_discharge -0.003776 0.015320 -0.246 0.805
daily_rain 1.780017 1.235258 1.441 0.150
deposition_occurred 2.131925 1.500636 1.421 0.155
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 33.933 on 316 degrees of freedom
Residual deviance: 30.878 on 313 degrees of freedom
(1843 observations deleted due to missingness)
AIC: 38.878
Number of Fisher Scoring iterations: 8
Code
# Coefficients:# Estimate Std. Error z value Pr(>|z|) # (Intercept) -5.665753 1.039347 -5.451 5e-08 ***# carp_discharge -0.003776 0.015320 -0.246 0.805 # daily_rain 1.780017 1.235258 1.441 0.150 # deposition_occurred 2.131925 1.500636 1.421 0.155 # ---# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1# # (Dispersion parameter for binomial family taken to be 1)# # Null deviance: 33.933 on 316 degrees of freedom# Residual deviance: 30.878 on 313 degrees of freedom# (1843 observations deleted due to missingness)# AIC: 38.878# # Number of Fisher Scoring iterations: 8
None of the predictors are statistically significant for ecoli exceedances.
For only total coliform exceedances:
Code
carpenteria_model_total_coliforms <-glm(total_coliforms_exceed ~ carp_discharge + daily_rain + deposition_occurred, data = carpenteria_beach_data, family = binomial)summary(carpenteria_model_total_coliforms)
Call:
glm(formula = total_coliforms_exceed ~ carp_discharge + daily_rain +
deposition_occurred, family = binomial, data = carpenteria_beach_data)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.4308875 0.2405496 -10.106 < 2e-16 ***
carp_discharge 0.0005904 0.0061057 0.097 0.92297
daily_rain 1.3279157 0.4154074 3.197 0.00139 **
deposition_occurred 1.0120347 0.5245308 1.929 0.05368 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 244.17 on 316 degrees of freedom
Residual deviance: 230.32 on 313 degrees of freedom
(1843 observations deleted due to missingness)
AIC: 238.32
Number of Fisher Scoring iterations: 5
Code
# Coefficients:# Estimate Std. Error z value Pr(>|z|) # (Intercept) -2.4308875 0.2405496 -10.106 < 2e-16 ***# carp_discharge 0.0005904 0.0061057 0.097 0.92297 # daily_rain 1.3279157 0.4154074 3.197 0.00139 ** # deposition_occurred 1.0120347 0.5245308 1.929 0.05368 . # ---# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1# # (Dispersion parameter for binomial family taken to be 1)# # Null deviance: 244.17 on 316 degrees of freedom# Residual deviance: 230.32 on 313 degrees of freedom# (1843 observations deleted due to missingness)# AIC: 238.32# # Number of Fisher Scoring iterations: 5
Using an alpha value of 0.05, only daily rain is statistically significant for total coliform exceedances. However, if an alpha of 0.1 is used, then deposition_occurred is also significant. For each unit increase in Carpinteria Creek discharge, the log odds of a total coliform exceedance increase by 0.0005904. For each unit increase in daily rain, the log odds of a total coliform exceedance increase by 1.3279157. When deposition occurs, the log odds of a total coliform exceedance increase by 1.0120347. This is translated to a 2.75 times increase in the odds of a total coliform exceedance when deposition occurs.
Arroyo Burro
Code
# fit the multivariate logistic regression modelab_any_exceed_model <-glm(any_exceed ~ log_discharge + daily_rain, data = ab_beach_data, family = binomial)summary(ab_any_exceed_model)
Call:
glm(formula = any_exceed ~ log_discharge + daily_rain, family = binomial,
data = ab_beach_data)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.4969 0.1269 -19.676 < 2e-16 ***
log_discharge 0.8783 0.1444 6.082 1.19e-09 ***
daily_rain 2.0350 0.7818 2.603 0.00924 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1029.19 on 1306 degrees of freedom
Residual deviance: 937.21 on 1304 degrees of freedom
(2438 observations deleted due to missingness)
AIC: 943.21
Number of Fisher Scoring iterations: 4
Code
# Call:# glm(formula = any_exceed ~ log_discharge + daily_rain, family = binomial, # data = ab_beach_data)# # Coefficients:# Estimate Std. Error z value Pr(>|z|) # (Intercept) -2.4969 0.1269 -19.676 < 2e-16 ***# log_discharge 0.8783 0.1444 6.082 1.19e-09 ***# daily_rain 2.0350 0.7818 2.603 0.00924 ** # ---# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1# # (Dispersion parameter for binomial family taken to be 1)# # Null deviance: 1029.19 on 1306 degrees of freedom# Residual deviance: 937.21 on 1304 degrees of freedom# (2438 observations deleted due to missingness)# AIC: 943.21# # Number of Fisher Scoring iterations: 4# The model predicts 'any_exceed' based on 'log_discharge' and 'daily_rain'.# Both predictors are statistically significant:# log_discharge: p = 1.19e-09 ***. For each unit increase, the log odds of 'any_exceed' increase by 0.8783.# daily_rain: p = 0.00924 **. For each unit increase, the log odds of 'any_exceed' increase by 2.0350.
For Arroyo Burro, both log_discharge and daily_rain are statistically significant in causing any type of exceedance. For each unit increase in log discharge, the log odds of an exceedance increase by 0.8783. For each unit increase in daily rain, the log odds of an exceedance increase by 2.0350.
For only enterococcus exceedances:
Code
# fit the multivariate logistic regression modelab_enterococcus_model <-glm(enterococcus_exceed ~ log_discharge + daily_rain, data = ab_beach_data, family = binomial)summary(ab_enterococcus_model)
Call:
glm(formula = enterococcus_exceed ~ log_discharge + daily_rain,
family = binomial, data = ab_beach_data)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.4455 0.1939 -17.768 < 2e-16 ***
log_discharge 0.5868 0.2085 2.815 0.00488 **
daily_rain 0.8750 0.7308 1.197 0.23117
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 492.99 on 1306 degrees of freedom
Residual deviance: 473.36 on 1304 degrees of freedom
(2438 observations deleted due to missingness)
AIC: 479.36
Number of Fisher Scoring iterations: 6
Code
# Coefficients:# Estimate Std. Error z value Pr(>|z|) # (Intercept) -3.4455 0.1939 -17.768 < 2e-16 ***# log_discharge 0.5868 0.2085 2.815 0.00488 ** # daily_rain 0.8750 0.7308 1.197 0.23117 # ---# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1# # (Dispersion parameter for binomial family taken to be 1)# # Null deviance: 492.99 on 1306 degrees of freedom# Residual deviance: 473.36 on 1304 degrees of freedom# (2438 observations deleted due to missingness)# AIC: 479.36# # Number of Fisher Scoring iterations: 6
For enterococcus exceedances, only log_discharge is statistically significant. For each unit increase in log discharge, the log odds of an enterococcus exceedance increase by 0.5868. For each unit increase in daily rain, the log odds of an enterococcus exceedance increase by 0.8750.
For only fecal coliform exceedances:
Code
# fit the multivariate logistic regression modelab_fecal_coliforms_model <-glm(fecal_coliforms_exceed ~ log_discharge + daily_rain, data = ab_beach_data, family = binomial)summary(ab_fecal_coliforms_model)
Call:
glm(formula = fecal_coliforms_exceed ~ log_discharge + daily_rain,
family = binomial, data = ab_beach_data)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -5.8962 0.5287 -11.152 < 2e-16 ***
log_discharge 1.0782 0.3802 2.836 0.00457 **
daily_rain 1.2226 1.0199 1.199 0.23063
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 127.01 on 1306 degrees of freedom
Residual deviance: 104.29 on 1304 degrees of freedom
(2438 observations deleted due to missingness)
AIC: 110.29
Number of Fisher Scoring iterations: 8
Code
# Coefficients:# Estimate Std. Error z value Pr(>|z|) # (Intercept) -5.8962 0.5287 -11.152 < 2e-16 ***# log_discharge 1.0782 0.3802 2.836 0.00457 ** # daily_rain 1.2226 1.0199 1.199 0.23063 # ---# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1# # (Dispersion parameter for binomial family taken to be 1)# # Null deviance: 127.01 on 1306 degrees of freedom# Residual deviance: 104.29 on 1304 degrees of freedom# (2438 observations deleted due to missingness)# AIC: 110.29# # Number of Fisher Scoring iterations: 8
Again, only log_discharge is statistically significant for fecal coliform exceedances. For each unit increase in log discharge, the log odds of a fecal coliform exceedance increase by 1.0782. For each unit increase in daily rain, the log odds of a fecal coliform exceedance increase by 1.2226.
For only total coliform exceedances:
Code
# fit the multivariate logistic regression modelab_total_coliforms_model <-glm(total_coliforms_exceed ~ log_discharge + daily_rain, data = ab_beach_data, family = binomial)summary(ab_total_coliforms_model)
Call:
glm(formula = total_coliforms_exceed ~ log_discharge + daily_rain,
family = binomial, data = ab_beach_data)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.20974 0.16831 -19.070 < 2e-16 ***
log_discharge 0.81180 0.17328 4.685 2.8e-06 ***
daily_rain -0.05611 0.67946 -0.083 0.934
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 628.93 on 1306 degrees of freedom
Residual deviance: 600.32 on 1304 degrees of freedom
(2438 observations deleted due to missingness)
AIC: 606.32
Number of Fisher Scoring iterations: 5
Code
# Coefficients:# Estimate Std. Error z value Pr(>|z|) # (Intercept) -3.20974 0.16831 -19.070 < 2e-16 ***# log_discharge 0.81180 0.17328 4.685 2.8e-06 ***# daily_rain -0.05611 0.67946 -0.083 0.934 # ---# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1# # (Dispersion parameter for binomial family taken to be 1)# # Null deviance: 628.93 on 1306 degrees of freedom# Residual deviance: 600.32 on 1304 degrees of freedom# (2438 observations deleted due to missingness)# AIC: 606.32# # Number of Fisher Scoring iterations: 5
For total coliform exceedances, only log_discharge is statistically significant. For each unit increase in log discharge, the log odds of a total coliform exceedance increase by 0.81180. For each unit increase in daily rain, the log odds of a total coliform exceedance decrease by 0.05611.
For only ecoli exceedances:
Code
# fit the multivariate logistic regression modelab_ecoli_model <-glm(ecoli_exceed ~ log_discharge + daily_rain, data = ab_beach_data, family = binomial)summary(ab_ecoli_model)
Call:
glm(formula = ecoli_exceed ~ log_discharge + daily_rain, family = binomial,
data = ab_beach_data)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -4.7312 0.3493 -13.546 <2e-16 ***
log_discharge 0.6405 0.3484 1.839 0.066 .
daily_rain 0.4237 1.1795 0.359 0.719
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 190.02 on 1306 degrees of freedom
Residual deviance: 184.15 on 1304 degrees of freedom
(2438 observations deleted due to missingness)
AIC: 190.15
Number of Fisher Scoring iterations: 7
Code
# Coefficients:# Estimate Std. Error z value Pr(>|z|) # (Intercept) -4.7312 0.3493 -13.546 <2e-16 ***# log_discharge 0.6405 0.3484 1.839 0.066 . # daily_rain 0.4237 1.1795 0.359 0.719 # ---# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1# # (Dispersion parameter for binomial family taken to be 1)# # Null deviance: 190.02 on 1306 degrees of freedom# Residual deviance: 184.15 on 1304 degrees of freedom# (2438 observations deleted due to missingness)# AIC: 190.15# # Number of Fisher Scoring iterations: 7
None of the predictors are statistically significant for ecoli exceedances, using an alpha value of 0.05.
Model Diagnostics
Dry Multivariate Logistic Regression Analysis
These models consider only days that have had less than 0.04 inches of rain, as according to USGS’s US Rain Days product https://earlywarning.usgs.gov/usraindry/rdreadme.php. This is to see if the discharge from the creeks is still a significant predictor of exceedances when there is no rain. This is important because the creeks are not receiving additional water from rain, so we can better understand the causality of deposition events on exceedances.
Goleta Beach
Code
# filter the data to only include days with less than 0.1 inches of raingoleta_beach_data_dry <- goleta_beach_data %>%filter(daily_rain <0.04)# accounting for lag, filtering to only observations of dry days that have been dry for the past 72 hours (aka three days)goleta_beach_data_dry_72hrs <- goleta_beach_data_dry %>%mutate(daily_rain_lag1 =lag(daily_rain, 1), # using dplyr's lag function to create lagged variablesdaily_rain_lag2 =lag(daily_rain, 2),daily_rain_lag3 =lag(daily_rain, 3)) %>%filter(daily_rain_lag1 <0.04) %>%filter(daily_rain_lag2 <0.04) %>%filter(daily_rain_lag3 <0.04)
Testing assumptions for a multiple logistic regression model:
Code
# check for correlation between predictorsgb_dry_cor <-cor(goleta_beach_data_dry_72hrs[, c("sj_discharge", "daily_rain", "deposition_occurred")])gb_dry_cor
Great, no multicollinearity between the predictors, since all VIF values are less than 5.
Code
# assess linearity between continuous predictors and log odds of the outcome# perform Box-Tidwell test, interaction terms are the continuous variablesgb_bt_interact_dry_model <-glm(any_exceed ~ sj_discharge + sj_discharge_interact + daily_rain + daily_rain_interact + deposition_occurred, data = goleta_beach_data_dry_72hrs, family = binomial)summary(gb_bt_interact_dry_model)
Call:
glm(formula = any_exceed ~ sj_discharge + sj_discharge_interact +
daily_rain + daily_rain_interact + deposition_occurred, family = binomial,
data = goleta_beach_data_dry_72hrs)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.614e+00 1.123e-01 -23.276 <2e-16 ***
sj_discharge 9.878e-02 9.413e-02 1.049 0.294
sj_discharge_interact -1.254e-02 2.795e-02 -0.449 0.654
daily_rain 6.770e+01 8.448e+01 0.801 0.423
daily_rain_interact -2.228e+03 3.290e+03 -0.677 0.498
deposition_occurred 2.414e+00 2.421e-01 9.972 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1061.39 on 1616 degrees of freedom
Residual deviance: 926.75 on 1611 degrees of freedom
(531 observations deleted due to missingness)
AIC: 938.75
Number of Fisher Scoring iterations: 5
Code
# Coefficients:# Estimate Std. Error z value Pr(>|z|) # (Intercept) -2.614e+00 1.123e-01 -23.276 <2e-16 ***# sj_discharge 9.878e-02 9.413e-02 1.049 0.294 # sj_discharge_interact -1.254e-02 2.795e-02 -0.449 0.654 # daily_rain 6.770e+01 8.448e+01 0.801 0.423 # daily_rain_interact -2.228e+03 3.290e+03 -0.677 0.498 # deposition_occurred 2.414e+00 2.421e-01 9.972 <2e-16 ***# ---# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1# # (Dispersion parameter for binomial family taken to be 1)# # Null deviance: 1061.39 on 1616 degrees of freedom# Residual deviance: 926.75 on 1611 degrees of freedom# (531 observations deleted due to missingness)# AIC: 938.75# # Number of Fisher Scoring iterations: 5
Only deposition is statistically significant. None of the continuous variables or their interaction variables within the Box-Tidwell test are significant, so the assumption of linearity is met.
Multivariate logistic regression modeling
For any exceedances:
Code
# fit the multivariate logistic regression modelgoleta_any_exceed_dry_model <-glm(any_exceed ~ sj_discharge + daily_rain + deposition_occurred, data = goleta_beach_data_dry_72hrs, family = binomial)summary(goleta_any_exceed_dry_model)
Call:
glm(formula = any_exceed ~ sj_discharge + daily_rain + deposition_occurred,
family = binomial, data = goleta_beach_data_dry_72hrs)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.64836 0.09009 -29.398 < 2e-16 ***
sj_discharge 0.05965 0.01725 3.459 0.000543 ***
daily_rain 2.91663 22.90500 0.127 0.898674
deposition_occurred 2.49558 0.23295 10.713 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1298.2 on 2147 degrees of freedom
Residual deviance: 1154.3 on 2144 degrees of freedom
AIC: 1162.3
Number of Fisher Scoring iterations: 5
Code
# Coefficients:# Estimate Std. Error z value Pr(>|z|) # (Intercept) -2.64836 0.09009 -29.398 < 2e-16 ***# sj_discharge 0.05965 0.01725 3.459 0.000543 ***# daily_rain 2.91663 22.90500 0.127 0.898674 # deposition_occurred 2.49558 0.23295 10.713 < 2e-16 ***# ---# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1# # (Dispersion parameter for binomial family taken to be 1)# # Null deviance: 1298.2 on 2147 degrees of freedom# Residual deviance: 1154.3 on 2144 degrees of freedom# AIC: 1162.3# # Number of Fisher Scoring iterations: 5
This is exactly as I expected, in that the rain should not be a significant predictor of exceedances when there is no rain. Both deposition occurrence and San Jose Creek discharge are statistically significant.
The log odds of an exceedance increase by 0.05965 for each unit increase in San Jose Creek discharge. The log odds of an exceedance increase by 2.91663 for each unit increase in daily rain. When deposition occurs, the log odds of an exceedance increase by 2.49558.
For only enterococcus exceedances:
Code
# fit the multivariate logistic regression modelgoleta_enterococcus_dry_model <-glm(enterococcus_exceed ~ sj_discharge + daily_rain + deposition_occurred, data = goleta_beach_data_dry_72hrs, family = binomial)summary(goleta_enterococcus_dry_model)
Call:
glm(formula = enterococcus_exceed ~ sj_discharge + daily_rain +
deposition_occurred, family = binomial, data = goleta_beach_data_dry_72hrs)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.82996 0.15424 -24.832 < 2e-16 ***
sj_discharge 0.04252 0.02200 1.933 0.0532 .
daily_rain -1.23225 40.80533 -0.030 0.9759
deposition_occurred 2.25276 0.31813 7.081 1.43e-12 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 582.74 on 2147 degrees of freedom
Residual deviance: 527.85 on 2144 degrees of freedom
AIC: 535.85
Number of Fisher Scoring iterations: 6
Code
# Coefficients:# Estimate Std. Error z value Pr(>|z|) # (Intercept) -3.82996 0.15424 -24.832 < 2e-16 ***# sj_discharge 0.04252 0.02200 1.933 0.0532 . # daily_rain -1.23225 40.80533 -0.030 0.9759 # deposition_occurred 2.25276 0.31813 7.081 1.43e-12 ***# ---# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1# # (Dispersion parameter for binomial family taken to be 1)# # Null deviance: 582.74 on 2147 degrees of freedom# Residual deviance: 527.85 on 2144 degrees of freedom# AIC: 535.85# # Number of Fisher Scoring iterations: 6
Maintaining an alpha of 0.05, only deposition_occurred is statistically significant for enterococcus exceedances. For each unit increase in San Jose Creek discharge, the log odds of an enterococcus exceedance increase by 0.04252. For each unit increase in daily rain, the log odds of an enterococcus exceedance decrease by 1.23225. When deposition occurs, the log odds of an enterococcus exceedance increase by 2.25276.
This is interesting because in the wet model, discharge was significant for enterococcus exceedances. This could be due to the fact that the creek discharge is not as high when there is no rain, so it is not as significant of a predictor. However, deposition is still significant.
For only fecal coliform exceedances:
Code
# fit the multivariate logistic regression modelgoleta_fecal_coliforms_dry_model <-glm(fecal_coliforms_exceed ~ sj_discharge + daily_rain + deposition_occurred, data = goleta_beach_data_dry_72hrs, family = binomial)summary(goleta_fecal_coliforms_dry_model)
Call:
glm(formula = fecal_coliforms_exceed ~ sj_discharge + daily_rain +
deposition_occurred, family = binomial, data = goleta_beach_data_dry_72hrs)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -4.765e+00 4.171e-01 -11.425 < 2e-16 ***
sj_discharge -1.947e+01 1.178e+01 -1.654 0.0982 .
daily_rain -1.427e+03 1.195e+05 -0.012 0.9905
deposition_occurred 3.931e+00 6.478e-01 6.069 1.29e-09 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 168.84 on 2147 degrees of freedom
Residual deviance: 122.98 on 2144 degrees of freedom
AIC: 130.98
Number of Fisher Scoring iterations: 19
Code
# Coefficients:# Estimate Std. Error z value Pr(>|z|) # (Intercept) -4.765e+00 4.171e-01 -11.425 < 2e-16 ***# sj_discharge -1.947e+01 1.178e+01 -1.654 0.0982 . # daily_rain -1.427e+03 1.195e+05 -0.012 0.9905 # deposition_occurred 3.931e+00 6.478e-01 6.069 1.29e-09 ***# ---# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1# # (Dispersion parameter for binomial family taken to be 1)# # Null deviance: 168.84 on 2147 degrees of freedom# Residual deviance: 122.98 on 2144 degrees of freedom# AIC: 130.98# # Number of Fisher Scoring iterations: 19
For fecal coliform exceedances, only deposition_occurred is statistically significant. For each unit increase in San Jose Creek discharge, the log odds of a fecal coliform exceedance decrease by 19.47. For each unit increase in daily rain, the log odds of a fecal coliform exceedance decrease by 1427. When deposition occurs, the log odds of a fecal coliform exceedance increase by 3.931. This can be interpreted as a 51.1 times increase in the odds of a fecal coliform exceedance when deposition occurs, since exp(3.931) = 50.95791.
These are the same results as the wet model, except the coefficient for San Jose Creek discharge and rain here are negative, and the log odds are increased in the dry model. In the wet model, for each unit increase in San Jose Creek discharge, the log odds of a fecal coliform exceedance increase by 0.01122, for each unit increase in daily rain, the log odds of a fecal coliform exceedance increase by 1.54283, and when deposition occurs, the log odds of a fecal coliform exceedance increase by 2.55626.
For only ecoli exceedances (missing 2018 data):
Code
# fit the multivariate logistic regression modelgoleta_ecoli_dry_model <-glm(ecoli_exceed ~ sj_discharge + daily_rain + deposition_occurred, data = goleta_beach_data_dry_72hrs, family = binomial)summary(goleta_ecoli_dry_model)
Call:
glm(formula = ecoli_exceed ~ sj_discharge + daily_rain + deposition_occurred,
family = binomial, data = goleta_beach_data_dry_72hrs)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -4.0535 0.1955 -20.735 <2e-16 ***
sj_discharge -0.3319 0.3242 -1.024 0.306
daily_rain 22.1899 37.3029 0.595 0.552
deposition_occurred -14.8761 1023.9761 -0.015 0.988
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 332.74 on 2147 degrees of freedom
Residual deviance: 327.40 on 2144 degrees of freedom
AIC: 335.4
Number of Fisher Scoring iterations: 18
Code
# Coefficients:# Estimate Std. Error z value Pr(>|z|) # (Intercept) -4.0535 0.1955 -20.735 <2e-16 ***# sj_discharge -0.3319 0.3242 -1.024 0.306 # daily_rain 22.1899 37.3029 0.595 0.552 # deposition_occurred -14.8761 1023.9761 -0.015 0.988 # ---# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1# # (Dispersion parameter for binomial family taken to be 1)# # Null deviance: 332.74 on 2147 degrees of freedom# Residual deviance: 327.40 on 2144 degrees of freedom# AIC: 335.4# # Number of Fisher Scoring iterations: 18
None of the predictors are statistically significant for ecoli exceedances.
For only total coliform exceedances:
Code
# fit the multivariate logistic regression modelgoleta_total_coliforms_dry_model <-glm(total_coliforms_exceed ~ sj_discharge + daily_rain + deposition_occurred, data = goleta_beach_data_dry_72hrs, family = binomial)summary(goleta_total_coliforms_dry_model)
Call:
glm(formula = total_coliforms_exceed ~ sj_discharge + daily_rain +
deposition_occurred, family = binomial, data = goleta_beach_data_dry_72hrs)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.60412 0.13796 -26.124 < 2e-16 ***
sj_discharge 0.06937 0.01837 3.777 0.000159 ***
daily_rain 2.30749 35.15259 0.066 0.947663
deposition_occurred 2.10372 0.29228 7.198 6.13e-13 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 696.38 on 2147 degrees of freedom
Residual deviance: 624.27 on 2144 degrees of freedom
AIC: 632.27
Number of Fisher Scoring iterations: 6
Code
# Coefficients:# Estimate Std. Error z value Pr(>|z|) # (Intercept) -3.60412 0.13796 -26.124 < 2e-16 ***# sj_discharge 0.06937 0.01837 3.777 0.000159 ***# daily_rain 2.30749 35.15259 0.066 0.947663 # deposition_occurred 2.10372 0.29228 7.198 6.13e-13 ***# ---# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1# # (Dispersion parameter for binomial family taken to be 1)# # Null deviance: 696.38 on 2147 degrees of freedom# Residual deviance: 624.27 on 2144 degrees of freedom# AIC: 632.27# # Number of Fisher Scoring iterations: 6
For total coliform exceedances, discharge and deposition occurrence are statistically significant. For each unit increase in San Jose Creek discharge, the log odds of a total coliform exceedance increase by 0.06937. For each unit increase in daily rain, the log odds of a total coliform exceedance increase by 2.30749. When deposition occurs, the log odds of a total coliform exceedance increase by 2.10372.
This is a similar result to the wet model, except that daily rain is not significant in the dry model. This is a good gut check, as rain should not be a significant predictor of total coliform exceedances when there is no rain.
Carpinteria Beach
Code
# filter the data to only include days with less than 0.1 inches of raincarpenteria_beach_data_dry <- carpenteria_beach_data %>%filter(daily_rain <0.04)# accounting for lag, filtering to only observations of dry days that have been dry for the past 72 hours (aka three days)carpenteria_beach_data_dry_72hrs <- carpenteria_beach_data_dry %>%mutate(daily_rain_lag1 =lag(daily_rain, 1), # using dplyr's lag function to create lagged variablesdaily_rain_lag2 =lag(daily_rain, 2),daily_rain_lag3 =lag(daily_rain, 3)) %>%filter(daily_rain_lag1 <0.04) %>%filter(daily_rain_lag2 <0.04) %>%filter(daily_rain_lag3 <0.04)
Testing assumptions for a multiple logistic regression model:
Code
# check for correlation between predictorscb_dry_cor <-cor(carpenteria_beach_data_dry_72hrs[, c("carp_discharge", "daily_rain", "deposition_occurred")])cb_dry_cor
# assess linearity between continuous predictors and log odds of the outcome# perform Box-Tidwell test, interaction terms are the continuous variablescb_bt_interact_dry_model <-glm(any_exceed ~ carp_discharge + carp_discharge_interact + daily_rain + daily_rain_interact + deposition_occurred, data = carpenteria_beach_data_dry_72hrs, family = binomial)summary(cb_bt_interact_dry_model)
Call:
glm(formula = any_exceed ~ carp_discharge + carp_discharge_interact +
daily_rain + daily_rain_interact + deposition_occurred, family = binomial,
data = carpenteria_beach_data_dry_72hrs)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.240e+00 2.092e+00 -1.549 0.121
carp_discharge -1.427e-01 2.206e-01 -0.647 0.518
carp_discharge_interact 3.870e-02 5.851e-02 0.661 0.508
daily_rain 1.744e+02 2.792e+02 0.625 0.532
daily_rain_interact -2.716e+03 7.609e+03 -0.357 0.721
deposition_occurred 9.466e-01 7.882e-01 1.201 0.230
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 80.793 on 75 degrees of freedom
Residual deviance: 72.798 on 70 degrees of freedom
(69 observations deleted due to missingness)
AIC: 84.798
Number of Fisher Scoring iterations: 4
Code
# Coefficients:# Estimate Std. Error z value Pr(>|z|)# (Intercept) -3.240e+00 2.092e+00 -1.549 0.121# carp_discharge -1.427e-01 2.206e-01 -0.647 0.518# carp_discharge_interact 3.870e-02 5.851e-02 0.661 0.508# daily_rain 1.744e+02 2.792e+02 0.625 0.532# daily_rain_interact -2.716e+03 7.609e+03 -0.357 0.721# deposition_occurred 9.466e-01 7.882e-01 1.201 0.230# # (Dispersion parameter for binomial family taken to be 1)# # Null deviance: 80.793 on 75 degrees of freedom# Residual deviance: 72.798 on 70 degrees of freedom# (69 observations deleted due to missingness)# AIC: 84.798# # Number of Fisher Scoring iterations: 4
Wow, none of the predictors are statistically significant! This is a good sign that the assumption of linearity is met.
Multivariate logistic regression modeling
For any exceedances:
Code
# fit the multivariate logistic regression modelcarpenteria_any_exceed_dry_model <-glm(any_exceed ~ carp_discharge + daily_rain + deposition_occurred, data = carpenteria_beach_data_dry_72hrs, family = binomial)summary(carpenteria_any_exceed_dry_model)
Call:
glm(formula = any_exceed ~ carp_discharge + daily_rain + deposition_occurred,
family = binomial, data = carpenteria_beach_data_dry_72hrs)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.92464 0.62763 -4.660 3.16e-06 ***
carp_discharge 0.04447 0.01732 2.568 0.0102 *
daily_rain 22.36139 29.57021 0.756 0.4495
deposition_occurred 1.29655 0.70399 1.842 0.0655 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 108.776 on 144 degrees of freedom
Residual deviance: 93.761 on 141 degrees of freedom
AIC: 101.76
Number of Fisher Scoring iterations: 5
Code
# Coefficients:# Estimate Std. Error z value Pr(>|z|) # (Intercept) -2.92464 0.62763 -4.660 3.16e-06 ***# carp_discharge 0.04447 0.01732 2.568 0.0102 * # daily_rain 22.36139 29.57021 0.756 0.4495 # deposition_occurred 1.29655 0.70399 1.842 0.0655 . # ---# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1# # (Dispersion parameter for binomial family taken to be 1)# # Null deviance: 108.776 on 144 degrees of freedom# Residual deviance: 93.761 on 141 degrees of freedom# AIC: 101.76# # Number of Fisher Scoring iterations: 5
It appears that only Carpinteria Creek discharge is statistically significant for any exceedances. If the alpha value is increased to 0.1, then deposition_occurred is also significant.
For each unit increase in Carpinteria Creek discharge, the log odds of an exceedance increase by 0.04447. For each unit increase in daily rain, the log odds of an exceedance increase by 22.36139. When deposition occurs, the log odds of an exceedance increase by 1.29655.
In the wet model, daily rain was significant, but in the dry model, it is not. This is a good sign that the model is working as expected. However, in the wet model the deposition was significant, but in the dry model it is not.
For only enterococcus exceedances:
Code
# fit the multivariate logistic regression modelcarpenteria_enterococcus_dry_model <-glm(enterococcus_exceed ~ carp_discharge + daily_rain + deposition_occurred, data = carpenteria_beach_data_dry_72hrs, family = binomial)summary(carpenteria_enterococcus_dry_model)
Call:
glm(formula = enterococcus_exceed ~ carp_discharge + daily_rain +
deposition_occurred, family = binomial, data = carpenteria_beach_data_dry_72hrs)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.13542 0.95854 -3.271 0.00107 **
carp_discharge 0.05017 0.02762 1.816 0.06935 .
daily_rain -22.16000 58.69514 -0.378 0.70577
deposition_occurred 0.34160 1.17971 0.290 0.77215
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 56.088 on 144 degrees of freedom
Residual deviance: 50.883 on 141 degrees of freedom
AIC: 58.883
Number of Fisher Scoring iterations: 6
Code
# Coefficients:# Estimate Std. Error z value Pr(>|z|) # (Intercept) -3.13542 0.95854 -3.271 0.00107 **# carp_discharge 0.05017 0.02762 1.816 0.06935 . # daily_rain -22.16000 58.69514 -0.378 0.70577 # deposition_occurred 0.34160 1.17971 0.290 0.77215 # ---# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1# # (Dispersion parameter for binomial family taken to be 1)# # Null deviance: 56.088 on 144 degrees of freedom# Residual deviance: 50.883 on 141 degrees of freedom# AIC: 58.883# # Number of Fisher Scoring iterations: 6
None of the predictors are statistically significant for enterococcus exceedances. This makes sense, because in the wet model only daily rain was a significant predictor for enterococcus exceedances.
For only fecal coliform exceedances:
Code
# fit the multivariate logistic regression modelcarpenteria_fecal_coliforms_dry_model <-glm(fecal_coliforms_exceed ~ carp_discharge + daily_rain + deposition_occurred, data = carpenteria_beach_data_dry_72hrs, family = binomial)summary(carpenteria_fecal_coliforms_dry_model)
Call:
glm(formula = fecal_coliforms_exceed ~ carp_discharge + daily_rain +
deposition_occurred, family = binomial, data = carpenteria_beach_data_dry_72hrs)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.657e+01 5.986e+04 0 1
carp_discharge -6.584e-17 2.494e+03 0 1
daily_rain -3.913e-13 3.183e+06 0 1
deposition_occurred -5.364e-15 9.919e+04 0 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 0.0000e+00 on 144 degrees of freedom
Residual deviance: 8.4123e-10 on 141 degrees of freedom
AIC: 8
Number of Fisher Scoring iterations: 25
Code
## WARNING!!!! There are only 14 observations, which I suspect is why the model is not converging.
There are only 14 observations for fecal coliform exceedances, which is why the model is not converging. This is not enough data to make any conclusions.
For only ecoli exceedances:
Code
# fit the multivariate logistic regression modelcarpenteria_ecoli_dry_model <-glm(ecoli_exceed ~ carp_discharge + daily_rain + deposition_occurred, data = carpenteria_beach_data_dry_72hrs, family = binomial)summary(carpenteria_ecoli_dry_model)
Call:
glm(formula = ecoli_exceed ~ carp_discharge + daily_rain + deposition_occurred,
family = binomial, data = carpenteria_beach_data_dry_72hrs)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.657e+01 5.986e+04 0 1
carp_discharge -6.584e-17 2.494e+03 0 1
daily_rain -3.913e-13 3.183e+06 0 1
deposition_occurred -5.364e-15 9.919e+04 0 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 0.0000e+00 on 144 degrees of freedom
Residual deviance: 8.4123e-10 on 141 degrees of freedom
AIC: 8
Number of Fisher Scoring iterations: 25
Code
# there are 34 observations, but the model did not converge
The model did not converge, likely because there are only 34 observations. This is not enough data to make any conclusions.
For only total coliform exceedances:
Code
# fit the multivariate logistic regression modelcarpenteria_total_coliforms_dry_model <-glm(total_coliforms_exceed ~ carp_discharge + daily_rain + deposition_occurred, data = carpenteria_beach_data_dry_72hrs, family = binomial)summary(carpenteria_total_coliforms_dry_model)
Call:
glm(formula = total_coliforms_exceed ~ carp_discharge + daily_rain +
deposition_occurred, family = binomial, data = carpenteria_beach_data_dry_72hrs)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.83501 0.82708 -4.637 3.54e-06 ***
carp_discharge 0.02791 0.01801 1.550 0.1212
daily_rain 41.60966 34.51389 1.206 0.2280
deposition_occurred 1.72673 0.83178 2.076 0.0379 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 77.878 on 144 degrees of freedom
Residual deviance: 68.470 on 141 degrees of freedom
AIC: 76.47
Number of Fisher Scoring iterations: 6
Code
# Coefficients:# Estimate Std. Error z value Pr(>|z|) # (Intercept) -3.83501 0.82708 -4.637 3.54e-06 ***# carp_discharge 0.02791 0.01801 1.550 0.1212 # daily_rain 41.60966 34.51389 1.206 0.2280 # deposition_occurred 1.72673 0.83178 2.076 0.0379 * # ---# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1# # (Dispersion parameter for binomial family taken to be 1)# # Null deviance: 77.878 on 144 degrees of freedom# Residual deviance: 68.470 on 141 degrees of freedom# AIC: 76.47# # Number of Fisher Scoring iterations: 6
Oooo, interesting! Deposition occurrence is the only statistically significant predictor for total coliform exceedances. For each unit increase in Carpinteria Creek discharge, the log odds of a total coliform exceedance increase by 0.02791. For each unit increase in daily rain, the log odds of a total coliform exceedance increase by 41.60966. When deposition occurs, the log odds of a total coliform exceedance increase by 1.72673.
This is cool because in the wet model, only daily rain was significant at an alpha value of 0.05, though deposition would have been signficiant with an alpha of 0.1. In the dry model, daily rain is not significant, but deposition is.
I am wary of the very high coefficient for daily rain, but I think this is due to the small sample size (48) … I will need to look into this further.
Arroyo Burro Beach
Code
# filter the data to only include days with less than 0.1 inches of rainab_beach_data_dry <- ab_beach_data %>%filter(daily_rain <0.04)# accounting for lag, filtering to only observations of dry days that have been dry for the past 72 hours (aka three days)ab_beach_data_dry_72hrs <- ab_beach_data_dry %>%mutate(daily_rain_lag1 =lag(daily_rain, 1), # using dplyr's lag function to create lagged variablesdaily_rain_lag2 =lag(daily_rain, 2),daily_rain_lag3 =lag(daily_rain, 3)) %>%filter(daily_rain_lag1 <0.04) %>%filter(daily_rain_lag2 <0.04) %>%filter(daily_rain_lag3 <0.04)
Testing assumptions for a multiple logistic regression model:
Code
ab_beach_data_dry_72hrs$ab_discharge_cfs <-as.numeric(ab_beach_data_dry_72hrs$ab_discharge_cfs)# check for multicollinearityarroyo_burro_any_exceed_dry_model <-glm(any_exceed ~ ab_discharge_cfs + daily_rain, data = ab_beach_data_dry_72hrs, family = binomial)arroyo_burro_any_exceed_dry_model
Call: glm(formula = any_exceed ~ ab_discharge_cfs + daily_rain, family = binomial,
data = ab_beach_data_dry_72hrs)
Coefficients:
(Intercept) ab_discharge_cfs daily_rain
-2.11934 0.07004 25.58958
Degrees of Freedom: 1240 Total (i.e. Null); 1238 Residual
(959 observations deleted due to missingness)
Null Deviance: 886.8
Residual Deviance: 881.5 AIC: 887.5
Code
# Coefficients:# (Intercept) ab_discharge_cfs daily_rain # -2.11934 0.07004 25.58958# HIGH multicolinearity....ab_any_exceed_log_dry_model <-glm(any_exceed ~ log_discharge + daily_rain, data = ab_beach_data_dry_72hrs, family = binomial)vif(ab_any_exceed_log_dry_model)
log_discharge daily_rain
1.002393 1.002393
Code
# log_discharge daily_rain # 1.002393 1.002393 # no issue once the discharge is transformed!
No multicollinearity between the predictors once the discharge is transformed.
Code
# assess linearity between continuous predictors and log odds of the outcome# perform Box-Tidwell test, interaction terms are the continuous variablesab_bt_interact_dry_model <-glm(any_exceed ~ log_discharge + log_discharge_interact + daily_rain + daily_rain_interact, data = ab_beach_data_dry_72hrs, family = binomial)summary(ab_bt_interact_dry_model)
Call:
glm(formula = any_exceed ~ log_discharge + log_discharge_interact +
daily_rain + daily_rain_interact, family = binomial, data = ab_beach_data_dry_72hrs)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.6189 0.2022 -12.950 <2e-16 ***
log_discharge 1.6354 0.6373 2.566 0.0103 *
log_discharge_interact -0.9001 0.5518 -1.631 0.1029
daily_rain 54.4597 84.6698 0.643 0.5201
daily_rain_interact -1750.7777 3815.1913 -0.459 0.6463
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 886.85 on 1240 degrees of freedom
Residual deviance: 871.06 on 1236 degrees of freedom
(959 observations deleted due to missingness)
AIC: 881.06
Number of Fisher Scoring iterations: 5
Code
# Coefficients:# Estimate Std. Error z value Pr(>|z|) # (Intercept) -2.6189 0.2022 -12.950 <2e-16 ***# log_discharge 1.6354 0.6373 2.566 0.0103 * # log_discharge_interact -0.9001 0.5518 -1.631 0.1029 # daily_rain 54.4597 84.6698 0.643 0.5201 # daily_rain_interact -1750.7777 3815.1913 -0.459 0.6463 # ---# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1# # (Dispersion parameter for binomial family taken to be 1)# # Null deviance: 886.85 on 1240 degrees of freedom# Residual deviance: 871.06 on 1236 degrees of freedom# (959 observations deleted due to missingness)# AIC: 881.06# # Number of Fisher Scoring iterations: 5
So, it appears that log_discharge is statistically significant, but daily rain is not. The interaction term for log_discharge (log_discharge_interact) is not significant (p = 0.1029), suggesting the relationship between log-transformed discharge and the log odds of exceedance is approximately linear.
Multivariate logistic regression modeling
For any exceedances:
Code
# fit the multivariate logistic regression modelab_any_exceed_dry_model <-glm(any_exceed ~ log_discharge + daily_rain, data = ab_beach_data_dry_72hrs, family = binomial)summary(ab_any_exceed_dry_model)
Call:
glm(formula = any_exceed ~ log_discharge + daily_rain, family = binomial,
data = ab_beach_data_dry_72hrs)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.3645 0.1316 -17.966 <2e-16 ***
log_discharge 0.6030 0.1668 3.615 0.0003 ***
daily_rain 20.3462 29.5277 0.689 0.4908
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 886.85 on 1240 degrees of freedom
Residual deviance: 874.21 on 1238 degrees of freedom
(959 observations deleted due to missingness)
AIC: 880.21
Number of Fisher Scoring iterations: 4
Code
# Coefficients:# Estimate Std. Error z value Pr(>|z|) # (Intercept) -2.3645 0.1316 -17.966 <2e-16 ***# log_discharge 0.6030 0.1668 3.615 0.0003 ***# daily_rain 20.3462 29.5277 0.689 0.4908 # ---# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1# # (Dispersion parameter for binomial family taken to be 1)# # Null deviance: 886.85 on 1240 degrees of freedom# Residual deviance: 874.21 on 1238 degrees of freedom# (959 observations deleted due to missingness)# AIC: 880.21# # Number of Fisher Scoring iterations: 4
Only log_discharge is statistically significant for any exceedances. For each unit increase in log-transformed discharge, the log odds of an exceedance increase by 0.6030. For each unit increase in daily rain, the log odds of an exceedance increase by 20.3462.
This is interesting because in the wet model, both discharge and rain were significant predictors of any exceedances. In the dry model, only discharge is significant.
For only enterococcus exceedances:
Code
# fit the multivariate logistic regression modelab_enterococcus_dry_model <-glm(enterococcus_exceed ~ log_discharge + daily_rain, data = ab_beach_data_dry_72hrs, family = binomial)summary(ab_enterococcus_dry_model)
Call:
glm(formula = enterococcus_exceed ~ log_discharge + daily_rain,
family = binomial, data = ab_beach_data_dry_72hrs)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.3185 0.2101 -15.792 <2e-16 ***
log_discharge 0.2704 0.2854 0.947 0.343
daily_rain 24.8899 44.9868 0.553 0.580
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 419.12 on 1240 degrees of freedom
Residual deviance: 417.97 on 1238 degrees of freedom
(959 observations deleted due to missingness)
AIC: 423.97
Number of Fisher Scoring iterations: 6
Code
# Coefficients:# Estimate Std. Error z value Pr(>|z|) # (Intercept) -3.3185 0.2101 -15.792 <2e-16 ***# log_discharge 0.2704 0.2854 0.947 0.343 # daily_rain 24.8899 44.9868 0.553 0.580 # ---# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1# # (Dispersion parameter for binomial family taken to be 1)# # Null deviance: 419.12 on 1240 degrees of freedom# Residual deviance: 417.97 on 1238 degrees of freedom# (959 observations deleted due to missingness)# AIC: 423.97# # Number of Fisher Scoring iterations: 6
None of the predictors are statistically significant for enterococcus exceedances. This is interesting because in the wet model, discharge was a significant predictor of enterococcus exceedances, but in the dry model it is not. Perhaps rain is a confounder in the wet model.
For only fecal coliform exceedances:
Code
# fit the multivariate logistic regression modelab_fecal_coliforms_dry_model <-glm(fecal_coliforms_exceed ~ log_discharge + daily_rain, data = ab_beach_data_dry_72hrs, family = binomial)summary(ab_fecal_coliforms_dry_model)
Call:
glm(formula = fecal_coliforms_exceed ~ log_discharge + daily_rain,
family = binomial, data = ab_beach_data_dry_72hrs)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -5.207e+00 5.989e-01 -8.694 <2e-16 ***
log_discharge -1.978e-01 9.957e-01 -0.199 0.843
daily_rain -1.354e+03 1.417e+05 -0.010 0.992
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 75.954 on 1240 degrees of freedom
Residual deviance: 75.558 on 1238 degrees of freedom
(959 observations deleted due to missingness)
AIC: 81.558
Number of Fisher Scoring iterations: 18
Code
# Coefficients:# Estimate Std. Error z value Pr(>|z|) # (Intercept) -5.207e+00 5.989e-01 -8.694 <2e-16 ***# log_discharge -1.978e-01 9.957e-01 -0.199 0.843 # daily_rain -1.354e+03 1.417e+05 -0.010 0.992 # ---# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1# # (Dispersion parameter for binomial family taken to be 1)# # Null deviance: 75.954 on 1240 degrees of freedom# Residual deviance: 75.558 on 1238 degrees of freedom# (959 observations deleted due to missingness)# AIC: 81.558# # Number of Fisher Scoring iterations: 18
Nothing is statistically significant for fecal coliform exceedances. In the wet model, only discharge was significant.
For only ecoli exceedances:
Code
# fit the multivariate logistic regression modelab_ecoli_dry_model <-glm(ecoli_exceed ~ log_discharge + daily_rain, data = ab_beach_data_dry_72hrs, family = binomial)summary(ab_ecoli_dry_model)
Call:
glm(formula = ecoli_exceed ~ log_discharge + daily_rain, family = binomial,
data = ab_beach_data_dry_72hrs)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -4.7498 0.3799 -12.503 <2e-16 ***
log_discharge 0.5513 0.4400 1.253 0.210
daily_rain 63.6080 51.5227 1.235 0.217
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 162.29 on 1240 degrees of freedom
Residual deviance: 159.74 on 1238 degrees of freedom
(959 observations deleted due to missingness)
AIC: 165.74
Number of Fisher Scoring iterations: 7
Code
# Coefficients:# Estimate Std. Error z value Pr(>|z|) # (Intercept) -4.7498 0.3799 -12.503 <2e-16 ***# log_discharge 0.5513 0.4400 1.253 0.210 # daily_rain 63.6080 51.5227 1.235 0.217 # ---# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1# # (Dispersion parameter for binomial family taken to be 1)# # Null deviance: 162.29 on 1240 degrees of freedom# Residual deviance: 159.74 on 1238 degrees of freedom# (959 observations deleted due to missingness)# AIC: 165.74# # Number of Fisher Scoring iterations: 7
Again, nothing is statistically significant for ecoli exceedances. In the wet model, only discharge was significant.
For only total coliform exceedances:
Code
# fit the multivariate logistic regression modelab_total_coliforms_dry_model <-glm(total_coliforms_exceed ~ log_discharge + daily_rain, data = ab_beach_data_dry_72hrs, family = binomial)summary(ab_total_coliforms_dry_model)
Call:
glm(formula = total_coliforms_exceed ~ log_discharge + daily_rain,
family = binomial, data = ab_beach_data_dry_72hrs)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.1931 0.1791 -17.828 < 2e-16 ***
log_discharge 0.7359 0.2044 3.600 0.000319 ***
daily_rain -0.7220 45.1669 -0.016 0.987246
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 549.71 on 1240 degrees of freedom
Residual deviance: 538.60 on 1238 degrees of freedom
(959 observations deleted due to missingness)
AIC: 544.6
Number of Fisher Scoring iterations: 5
Code
# Coefficients:# Estimate Std. Error z value Pr(>|z|) # (Intercept) -3.1931 0.1791 -17.828 < 2e-16 ***# log_discharge 0.7359 0.2044 3.600 0.000319 ***# daily_rain -0.7220 45.1669 -0.016 0.987246 # ---# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1# # (Dispersion parameter for binomial family taken to be 1)# # Null deviance: 549.71 on 1240 degrees of freedom# Residual deviance: 538.60 on 1238 degrees of freedom# (959 observations deleted due to missingness)# AIC: 544.6# # Number of Fisher Scoring iterations: 5
Wow, here the discharge is significant for total coliform exceedances, but daily rain is not. This is the same outcome as the wet model. It would be good to consider why total coliforms might still be significant in the dry model, but not the other bacteria types other than any exceedances.
When there is discharge, it increases the log odds of a total coliform exceedance by 0.7359. When there is daily rain, it decreases the log odds of a total coliform exceedance by 0.7220.